Closed jolyonb closed 5 years ago
@ChristopherChudzicki Can you let me know exactly what the relevant issue is here, so I can do it myself?
@ChristopherChudzicki Can you let me know what's going on with this?
First, let me say that this is causing no observable bugs as far as I know.
All strings we use represent text. So Python 2 <--> Python 3 compatibility guides seemed to suggest that we use unicode everywhere. So it was my goal that:
u
-prefix, and can't import unicode_literals
). This was mostly successful. Note below that 'cat'
and 'Incorrect.' are both unicode strings. This conversion was achieved by replacing
strin schemas with
text_string`, which coerces to unicode.
from mitxgraders import SquareMatrices, StringGrader
grader = StringGrader(answers='cat', wrong_msg='Incorrect.')
print(grader)
# StringGrader({u'accept_any': False,
# u'accept_nonempty': False,
# 'answers': ({u'expect': u'cat',
# u'grade_decimal': 1,
# u'msg': u'',
# u'ok': True},),
# u'attempt_based_credit': False,
# u'attempt_based_credit_msg': True,
# u'case_sensitive': True,
# u'clean_spaces': True,
# u'debug': False,
# u'decrease_credit_after': 1,
# u'decrease_credit_steps': 4,
# u'explain_minimums': u'err',
# u'explain_validation': u'err',
# u'invalid_msg': u'Your input is not in the expected format',
# u'min_length': 0,
# u'min_words': 0,
# u'minimum_credit': 0.2,
# u'strip': True,
# u'strip_all': False,
# u'suppress_warnings': False,
# u'validation_pattern': None,
# 'wrong_msg': u'Incorrect.'})
The example above shows some cracks, too. Note that the answers
and wrong_msg
KEYS are in fact byte strings, even though their values both use unicode. Right now, any author-supplied keys are bytestrings.
But bytestring and unicode string have same hash (hash('cat') == hash(u'cat')
, so this probably won't cause any observable bugs.
Crack 1 was about keys. Some values are also not converted:
from mitxgraders import SquareMatrices, StringGrader
sampler = SquareMatrices(symmetry='diagonal')
print(sampler.config)
# {u'determinant': None, 'symmetry': 'diagonal', u'dimension': 2, u'shape': (2, 2), u'complex': False, u'traceless': False, u'norm': {u'start': 1, u'stop': 5}}
Note that the symmetry value 'diagonal'
is not unicode. Reason is that the schema, Any(None, 'diagonal', 'symmetric', ...)
does not convert strings to unicode.
First of all: it may not be worth resolving this.
'cat'==u'cat'
is True
, I don't think this is going to cause observable bugs. The only observable bug I've seen related to this was in the matrixsampling
PR where an is
comparison was used (we changed it to ==
). Since 'cat' is u'cat'
returns False
, comparisons using is
could be problematic. If we do want to fix this...
Any(None, 'diagonal', 'symmetric', ...)
to something like Any(None, text('diagonal'), text('symmetric'), ...)
where text(value)
checks for equality with value
and coerces to unicode.Ok, I see your point.
So, I think that Any(None, 'diagonal', 'symmetric', ...)
actually has unicode for 'diagonal'
etc already, because we ARE importing unicode_literals
. It's just that voluptuous allows for 'diagonal'
to match u'diagonal'
.
I think that all we really need to do is to modify ObjectWithSchema
to coerce all text-like entries (keys, values, etc) to be unicode as they enter the library, as you suggest for a fix to Issue 1. I actually think this won't be so bad - I'll see if I can put together a PR this evening.
@ChristopherChudzicki pointed out an issue with
Any(None, 'upper', 'lower')
as a schema in #222 with regards to unicode. I'm not sure what it is, but here's a ticket for it anyway.