Windows EOL in HTML not supported

ohare93 / brain-brew

Automated Anki flashcard creation and extraction to/from Csv

The Unlicense

89 stars 5 forks source link

Windows EOL in HTML not supported #51

Closed AFulgens closed 3 weeks ago

AFulgens commented 11 months ago

I am currently trying out a similar integration to Ultimate Geography with one of my decks. However, I am facing the issue that I am working on Windows and the HTML exports from CrowdAnki have CRLF EOL. Thus, I had to patch my brainbrew locally, namely had to change the regex https://github.com/ohare93/brain-brew/blob/1426351fd6c9e1ead68c44881db61e926f5a600a/brain_brew/representation/yaml/note_model_template.py#L26 to accept [\r\n]s instead of [\n]. This is actually making it be in sync with utils.filename_from_full_path and utils.folder_name_from_full_path.

I haven't found contribution guidelines to this repo, I could prepare a PR from a fork, if it would be wanted.

ohare93 commented 11 months ago

Please do 👍 I haven't done much testing on Windows myself, so thanks for fixing this 😄

ohare93 commented 11 months ago

Possibly related to #43? 😅

aplaice commented 11 months ago

I haven't looked at #43 in a while, so I'm not 100% sure, but I think the PR itself is independent of the issue here (it's about replacing the old codecs.open with just open).

The discussion in "Aside on newlines" probably is vaguely relevant (but again not 100% sure) — I believe that in order to allow Windows and Linux users to inter-operate peacefully (without a back-and-forth war between \n and \r\n ) we will need to add newline='' to the open call.

Maybe, though, let's just cross this once we encounter this in the wild? (i.e. if I had a vote, I'd vote for just applying anything that @AFulgens needs, for now. :))

ohare93 commented 11 months ago

You always have a vote, my friend 😁

AFulgens commented 11 months ago

I'll prepare a PR then.

On a side-note, I have noticed that there is also a problem with encoding, namely if I use Umlauts in fields name (ü comes to mind), then I will get a mixture of UTF-8 and cp1252 files. Is that also something Windows/Windows + Python related, i.e., on Linux non-ASCII should work fine? It's also something I could look into, but that sounds like a heavy change, because alls the reads/writes have to be fixed to UTF-8 instead of defaulting. Would that also be something interesting or should I just live with the fact that my fields should not contain high-ASCII/non-ASCII characters?

aplaice commented 11 months ago

Would that also be something interesting

IMO yes, definitely!

Is there cp1252 outside the YAML files? AFAICT all the other (i.e. except the two in yaml_object.py) relevant calls to open explicitly specify a UTF-8 encoding, so the CSV files, HTML files etc. should all just have UTF-8.

(The calls in setup.py and scripts/yamale_build.py also don't have an encoding specified, but that shouldn't affect the main operation of BrainBrew.)

AFulgens commented 11 months ago

So far I have seen cp1252 only in YAML files generated with anki_to_sources recipes and problems with non-ASCII in sources_to_anki as well (i.e., mojibake in the resulting deck.json). I'll investigate and if it's not too many changes I will roll it together into a single PR with the regex fix.

ohare93 commented 3 weeks ago

Should be solved now 👍