Unicode punctuation characters should be handled as ASCII synonyms

rtoy commented 6 days ago

Imported from SourceForge on 2024-07-05 18:33:31 Created by robert_dodier on 2022-11-01 21:46:43 Original: https://sourceforge.net/p/maxima/bugs/4039

I don't have an example at hand, but it has happened often enough that code gets copy/pasted from a document which uses Unicode characters for plus, minus, quote mark, etc.

There is probably a whole set of punctuation characters which look like the ASCII equivalents. I don't if we should try to handle all of them -- maybe, maybe not; not sure -- but anyway surely we can handle the most common ones, which seem to be plus and minus.

I'm thinking that such characters should be given the same properties for the parser which the ASCII equivalents already have, so that if Unicode plus or minus is encountered, it yields the same as if the ASCII character were present.

Unicode characters in quoted strings would be preserved, but not in parsed expressions -- I am thinking that once the expression has been emitted by the parser, you can't tell if the input was a Unicode or ASCII character.

See also bug #4035.

rtoy commented 6 days ago

Imported from SourceForge on 2024-07-05 18:33:32 Created by macrakis on 2023-08-05 18:56:40 Original: https://sourceforge.net/p/maxima/bugs/4039/#fcfc

Example I just ran into today: some Wikipedia tables use Unicode minus-sign (−) rather than hyphen (-) for negative numbers.

rtoy commented 6 days ago

Imported from SourceForge on 2024-07-05 18:33:36 Created by robert_dodier on 2024-06-11 21:22:11 Original: https://sourceforge.net/p/maxima/bugs/4039/#2a16

labels: parser --> parser, unicode

rtoy / maxima

Unicode punctuation characters should be handled as ASCII synonyms #2076