Unicode equivalents of ASCII characters

numbas / Numbas

A completely browser-based e-assessment/e-learning system, with an emphasis on mathematics

http://www.numbas.org.uk

Apache License 2.0

207 stars 122 forks source link

Unicode equivalents of ASCII characters #690

Closed christianp closed 1 year ago

christianp commented 4 years ago

A student typed 2ˆ3, which was not interpreted as valid. The character ˆ is a modifier, but looks very similar to the 'real' character, ^. Numbas should consider ˆ to be a synonym of ^.

The dictionaries opSynonyms and funcSynonyms in runtime/scripts/jme.js map alternative names for operators and functions onto their canonical names. Add entries to these dictionaries mapping unicode symbols onto their equivalents.

A good place to find unicode symbols is graphemica.com

NewmanJ1987 commented 4 years ago

Hi, I'm new to numbas and would like to get involved with the development. Do you know how I can reproduce this ? I would love to work on this.

christianp commented 4 years ago

@NewmanJ1987 thanks for offering! I added this character in commit 36654b3, but there are others that would be good to have if you want to help. For example, there are many characters that look like - which a student might use for 'minus'. Graphemica displays these nicely: https://graphemica.com/search?q=minus You could add these characters as synonyms for -, following commit 36654b3 as a template.

NewmanJ1987 commented 4 years ago

Sure that looks like a good entry task. Thanks.

christianp commented 4 years ago

It looks like String.normalize can do a lot of the work.

abhijeetsharma200 commented 4 years ago

Hi, I am new to open source and would like to contribute to this issue if it is unassigned. However, I am unsure how you would like me to implement String.normalize given that currently synonyms are hard coded in a dictionary in the variable opSynonyms.

christianp commented 4 years ago

@abhijeetsharma200 String.normalize is a built-in method in JavaScript. One way it could be used would be in Numbas.jme.Parser.tokenise, to normalise expr before tokenising it.

grplyler commented 4 years ago

Hi there! I am looking to contribute to open source for Hacktoberfest 2020. Is this issue still up for grabs?

christianp commented 4 years ago

We should also accept the mathematical alphanumeric symbols, which are used by MathJax.

christianp commented 4 years ago

@grplyler sorry I didn't reply earlier - yes, this issue is still open, and there are plenty of easy things you can add in. Please submit a pull request!

christianp commented 2 years ago

There are fullwidth equivalents of lots of punctuation marks, such as U+FF0C "Fullwidth Comma"

christianp commented 1 year ago

We had a student using a fullwidth parenthesis: （

(But that's already supported, so that's not the bug I was looking for!)

christianp commented 1 year ago

We have had a spate of students who wrote Greek letters as unicode and whose expressions were marked incorrect, because Numbas doesn't consider e.g. theta and θ to be equivalent.

sangwinc commented 1 year ago

@christianp, we already have this issue and I plan to implement something this summer: https://github.com/maths/moodle-qtype_stack/issues/860 Would you be interested in sharing lists of unicode between our projects on this issue, perhaps with a JSON file of "known equivalents"?

christianp commented 1 year ago

@sangwinc - good idea! I'm now looking at typing out a big list of character mappings. The generic Unicode normalisation algorithms don't really help, because they ignore some differences that are mathematically significant, or don't consider equivalent some things that would be convenient for us. I think we have to do it character-by-character (or character-class by character class, at least)

christianp commented 1 year ago

This big list of LaTeX to unicode mappings that I made for mathstodon, based on unicodeit.net, might help: https://github.com/christianp/mastodon/blob/mathstodon-4.1.0/app/javascript/mastodon/features/compose/util/autolatex/data.js

christianp commented 1 year ago

I'm working on this at https://github.com/numbas/unicode-math-normalization. I've produced a set of files giving explicit mappings from some Unicode characters to JME syntax, and identified some things that can be normalized using the standard normalization algorithm.

Tomorrow I'll try to integrate this with the JME parser.