Enforce Unicode for all strings and results

pittcsc / PittAPI

An API to easily get data from the University of Pittsburgh

https://pittapi.pittcsc.org

GNU General Public License v2.0

108 stars 33 forks source link

Enforce Unicode for all strings and results #17

Closed RitwikGupta closed 7 years ago

RitwikGupta commented 7 years ago

Write now the string handling is very hacky so you'll see Unicode characters being expanded to \u0blah in the responses. Please ensure that proper Unicode encoding is enforced throughout so we don't have this issue. UTF-8 is what we should stick with.

Rahi374 commented 7 years ago

I only see you using \u once, replacing \u0026 with & I did a replacement once too, of \u2013 to - I personally think it's easier to work with & and - as ascii characters.

I convert between é and e when I'm translating dining location names to dict keys and vice versa, but I feel like that's more convenient since you don't really want to have to type special unicode characters into a dict key.

RitwikGupta commented 7 years ago

Right, I'm using Unicode characters once in a while myself, but what I mean is for you to ensure that all strings are being handled as UTF-8 in the code. Right now they're mostly being handled as ASCII and that's wrong.

RitwikGupta commented 7 years ago

Encode every string as UTF-8 from start to end

Rahi374 commented 7 years ago

I see. So then the string replacements (like \u2013 to &) can still stay but we treat all strings as unicode?

RitwikGupta commented 7 years ago

Once you change all string to proper unicode, the \u2013 will actually go away and we'd be able to use the proper en dash character to replace instead.

RitwikGupta commented 7 years ago

You wouldn't have to do anything with the e either then. It'll properly be represented with an accent automatically.

Rahi374 commented 7 years ago

I see. But wouldn't it be difficult to type it in when one is trying to get_dining_location_by_name()?

RitwikGupta commented 7 years ago

Yes, but that function may not be the best idea in the first place especially because it's easier to get all the data anyways.

Rahi374 commented 7 years ago

Oh okay. So it's fine to have unicode characters in dict keys?

RitwikGupta commented 7 years ago

Since the main purpose of any of these return values is mainly for display anyways, yep!

Rahi374 commented 7 years ago

I'll wait until the PeopleAPI asynchronization is finished before doing this.

RitwikGupta commented 7 years ago

Just start working on it in a branch and we can deal with merge conflicts!

Rahi374 commented 7 years ago

Oh okay got it.

RitwikGupta commented 7 years ago

Dropped Py2.7 support, everything is Unicode now.