mstksg / advent-of-code-ocr

Parsing ASCII art word solutions for advent of code
https://hackage.haskell.org/package/advent-of-code-ocr
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

A bunch of letterforms are missing from defaultLetterMap #1

Closed gilgamec closed 3 years ago

gilgamec commented 3 years ago

As near as I can tell, there are quite a few letterforms missing from the defaultLetterMap.

λ> sort $ M.elems $ OCR.getLetterMap OCR.defaultLetterMap
"AABBCCEEFFGGHHIJJKKLLNOPPRX"

It looks like it's missing one of the Rs, the U, the Y, and both Zs.

gilgamec commented 3 years ago

I think the problem is that spurious spaces are getting into the string literal somehow:

λ> raw1 = bimap id (filter (/=' ')) $ OCR.rawLetterforms1 
λ> sort $ M.elems $ OCR.getLetterMap $ uncurry OCR.parseLetterMap raw1
"ABCEFGHIJKLOPRUYZ"

Maybe a problem with heredoc?

mstksg commented 3 years ago

Ah, thank you...this is definitely interesting, and concerning! I wonder what is going on here...it does sound like it could be a problem with the heredoc. I should probably also build some fuzz testing for this instead of relying on some built in forms.

mstksg commented 3 years ago

I've added a filter to the heredocs, this seems to resolve the issue for now. But I do wonder what's happening.

mstksg commented 3 years ago

Ah! It might be a locale thing, since checking defaultLettermap on my machine gives the full alphabet. What OS are you using?

gilgamec commented 3 years ago

I'm on MacOS 10.13.

mstksg commented 3 years ago

hm..definitely interesting.

i'm closing this issue because I've implemented the filtering change mentioned, but it would be interesting to investigate this further on the heredoc side. thanks opening!