mitodl / mitx-grading-library

The MITx Grading Library, a python grading library for edX
https://edge.edx.org/courses/course-v1:MITx+grading-library+examples/
BSD 3-Clause "New" or "Revised" License
14 stars 9 forks source link

Fix unicode issue with schemas #238

Closed jolyonb closed 5 years ago

jolyonb commented 5 years ago

@ChristopherChudzicki pointed out an issue with Any(None, 'upper', 'lower') as a schema in #222 with regards to unicode. I'm not sure what it is, but here's a ticket for it anyway.

jolyonb commented 5 years ago

@ChristopherChudzicki Can you let me know exactly what the relevant issue is here, so I can do it myself?

jolyonb commented 5 years ago

@ChristopherChudzicki Can you let me know what's going on with this?

ChristopherChudzicki commented 5 years ago

First, let me say that this is causing no observable bugs as far as I know.

All strings we use represent text. So Python 2 <--> Python 3 compatibility guides seemed to suggest that we use unicode everywhere. So it was my goal that:

  1. internally, all strings are unicode.
  2. But course authors should to be able to pass traditional Python 2 byte strings to our graders and have them converted to unicode. (Necessary since authors shouldn't need to u-prefix, and can't import unicode_literals).

This was mostly successful. Note below that 'cat' and 'Incorrect.' are both unicode strings. This conversion was achieved by replacingstrin schemas withtext_string`, which coerces to unicode.

from mitxgraders import SquareMatrices, StringGrader

grader = StringGrader(answers='cat', wrong_msg='Incorrect.')
print(grader)
# StringGrader({u'accept_any': False,
#  u'accept_nonempty': False,
#  'answers': ({u'expect': u'cat',
#               u'grade_decimal': 1,
#               u'msg': u'',
#               u'ok': True},),
#  u'attempt_based_credit': False,
#  u'attempt_based_credit_msg': True,
#  u'case_sensitive': True,
#  u'clean_spaces': True,
#  u'debug': False,
#  u'decrease_credit_after': 1,
#  u'decrease_credit_steps': 4,
#  u'explain_minimums': u'err',
#  u'explain_validation': u'err',
#  u'invalid_msg': u'Your input is not in the expected format',
#  u'min_length': 0,
#  u'min_words': 0,
#  u'minimum_credit': 0.2,
#  u'strip': True,
#  u'strip_all': False,
#  u'suppress_warnings': False,
#  u'validation_pattern': None,
#  'wrong_msg': u'Incorrect.'})

Issue 1

The example above shows some cracks, too. Note that the answers and wrong_msg KEYS are in fact byte strings, even though their values both use unicode. Right now, any author-supplied keys are bytestrings.

But bytestring and unicode string have same hash (hash('cat') == hash(u'cat'), so this probably won't cause any observable bugs.

Issue 2

Crack 1 was about keys. Some values are also not converted:

from mitxgraders import SquareMatrices, StringGrader

sampler = SquareMatrices(symmetry='diagonal')
print(sampler.config)
# {u'determinant': None, 'symmetry': 'diagonal', u'dimension': 2, u'shape': (2, 2), u'complex': False, u'traceless': False, u'norm': {u'start': 1, u'stop': 5}}

Note that the symmetry value 'diagonal' is not unicode. Reason is that the schema, Any(None, 'diagonal', 'symmetric', ...) does not convert strings to unicode.

Resolving this

First of all: it may not be worth resolving this.

If we do want to fix this...

jolyonb commented 5 years ago

Ok, I see your point.

So, I think that Any(None, 'diagonal', 'symmetric', ...) actually has unicode for 'diagonal' etc already, because we ARE importing unicode_literals. It's just that voluptuous allows for 'diagonal' to match u'diagonal'.

I think that all we really need to do is to modify ObjectWithSchema to coerce all text-like entries (keys, values, etc) to be unicode as they enter the library, as you suggest for a fix to Issue 1. I actually think this won't be so bad - I'll see if I can put together a PR this evening.