Closed bwittman closed 3 years ago
@bwittman thank you for your contribution. Please give me some time to test with different character encodings before specifying to utf-8.
Any progress on testing this PR? I believe HTML documents are supposed to be stored in UTF-8 and can often contain non-ASCII characters.
It's a bit confusing that the error generated by a non-ASCII UTF-8 character in the source is "memory exhausted". I spent some time trying to figure out how to give the garbage collector more memory, only to discover that this patch fixes the problem without changing the default memory allocation.
Any progress on testing this PR? I believe HTML documents are supposed to be stored in UTF-8 and can often contain non-ASCII characters.
Sorry for long time no response. The reason I have been reluctant is that I myself use the non-ASCII characters (Japanese) stored in UTF-8 and I'm having no troubles with it.
Which lisp implementation are you using with which LOCALE settings @bwittman ?
Answering for myself... I've been running sbcl-bin 1.4.4 with default MacOS LOCALE settings for my area (en_US). Today, I upgraded to sbcl-bin 1.5.7, undid Barry's patch, and it worked.
If LANG
is properly set, I suppose that the default external format will be set properly for your non-ascii characters with SBCL. I am not an expert on this though.
By the way my LANG
is set to ja_JP.UTF-8
.
The completely re-imeplmented JavaScript version of Asciidoctor-Chunker is now released. Since JavaScript handles UTF8 by default, there should be no problem for non-ASCII characters. So I will close this issue. Please open a new issue if you find any problems with the new version. Thank you.
This change adds the :external-format :utf-8 flag when opening files for reading or writing so that HTML containing non-ASCII UTF-8 characters won't crash the chunker.