wshito / asciidoctor-chunker

The utility to create chunked HTML files from the single HTML generated by Asciidoctor.
MIT License
25 stars 7 forks source link

Chunker can crash if non-ASCII characters present #8

Closed bwittman closed 3 years ago

bwittman commented 5 years ago

This change adds the :external-format :utf-8 flag when opening files for reading or writing so that HTML containing non-ASCII UTF-8 characters won't crash the chunker.

wshito commented 5 years ago

@bwittman thank you for your contribution. Please give me some time to test with different character encodings before specifying to utf-8.

bwittman commented 5 years ago

Any progress on testing this PR? I believe HTML documents are supposed to be stored in UTF-8 and can often contain non-ASCII characters.

jtkorb commented 5 years ago

It's a bit confusing that the error generated by a non-ASCII UTF-8 character in the source is "memory exhausted". I spent some time trying to figure out how to give the garbage collector more memory, only to discover that this patch fixes the problem without changing the default memory allocation.

wshito commented 5 years ago

Any progress on testing this PR? I believe HTML documents are supposed to be stored in UTF-8 and can often contain non-ASCII characters.

Sorry for long time no response. The reason I have been reluctant is that I myself use the non-ASCII characters (Japanese) stored in UTF-8 and I'm having no troubles with it.

Which lisp implementation are you using with which LOCALE settings @bwittman ?

jtkorb commented 5 years ago

Answering for myself... I've been running sbcl-bin 1.4.4 with default MacOS LOCALE settings for my area (en_US). Today, I upgraded to sbcl-bin 1.5.7, undid Barry's patch, and it worked.

wshito commented 5 years ago

If LANG is properly set, I suppose that the default external format will be set properly for your non-ascii characters with SBCL. I am not an expert on this though.

By the way my LANG is set to ja_JP.UTF-8.

wshito commented 3 years ago

The completely re-imeplmented JavaScript version of Asciidoctor-Chunker is now released. Since JavaScript handles UTF8 by default, there should be no problem for non-ASCII characters. So I will close this issue. Please open a new issue if you find any problems with the new version. Thank you.