papnkukn / eml-format

RFC 822 EML file format parser and builder
MIT License
88 stars 53 forks source link

Support Unicode Characters with Line Breaks in `unquotePrintable` Function #32

Open joaoaugustogrobe opened 1 year ago

joaoaugustogrobe commented 1 year ago

Currently, the unquotePrintable function does not handle cases where a Unicode character is split across two lines, as demonstrated in the following example:

unicode-character-split-across-lines

This causes incorrect parsing of the input:

incorrect-parsing

To address this issue, I propose modifying the unquotePrintable function to support Unicode characters with line breaks. The updated function includes a non-capturing group (?:(?:=)?\r?\n)? that matches an optional equal sign =?, followed by an optional carriage return \r? and a newline character \n. The non-capturing group is made optional using the ? at the end.

With this change, the function can successfully parse the example shown in the image:

correct-parsing

Please review the proposed changes and let me know if you have any suggestions or concerns. I am looking forward to your feedback and improving the functionality of the unquotePrintable function.

joaoaugustogrobe commented 1 year ago

FYI @papnkukn