Byte count incorrect for Whitespace code blocks

vihanb / PPCG-Design

A redesign of the PPCG website

codegolf.stackexchange.com

35 stars 13 forks source link

Byte count incorrect for Whitespace code blocks #121

Closed ephphatha closed 7 years ago

ephphatha commented 7 years ago

I suspect this is because the TIO generated markup uses HTML entities for tab characters. See this answer for a couple of examples. This post is at index 64 in the answers array if that's easier to reference.

The following code declares the block in the screenshot:

<pre><code>

 &#9;

</code></pre>

(That's three literal spaces, four literal newlines, one HTML tab entity)

screenshot of whitespace code block showing 0 byte score

vihanb commented 7 years ago

Hm right now it's trimming all whitespace which makes this zero bytes. @ETHproductions would trimming with: s.replace(/^\s*([\S\s]+?)\s*$/, "$1") work? That would require there to be at least one-non-whitespace character for it to trim

ETHproductions commented 7 years ago

Not really, there are some visible answers that contain leading or trailing whitespace. What does the content look like before trimming? Is there just an extra newline or two occasionally?

ETHproductions commented 7 years ago

Simply removing the .trim() call seems to work, except that lines containing only spaces don't show up in the code block and thus still aren't counted. Not sure there's anything we can do about that without accessing the API...

ephphatha commented 7 years ago

Yeah, spaces getting stripped is something that I don't expect this plugin to deal with. It's something with the way the PPCG site works since I get the same behaviour in IE.

I've noticed it gives a different count for UTF8 bytes and chars when the non-breaking space HTML entity is used but the correct count if the space HTML entity is used. Should nbsp be counted as a single byte for this language (it's not a recognised character by the spec anyway) or should we just be sure to encode spaces instead?

ETHproductions commented 7 years ago

I added a feature that will count Whitespace bytes separately from typical languages. It counts any whitespace char or any match of \\?[stn] as a single byte and ignores everything else. This may not be the best solution, but it's a start, and it works on the particular case you've given.

Since we can't really do anything about answers with missing spaces in the HTML, I'll just close this issue for now (though feel free to comment if there are still any problems).