openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

Nepali characters in SQlite DB do not render in JSON, CSV or display in the app #883

Open equivalentideas opened 9 years ago

equivalentideas commented 9 years ago

See discussion at https://help.morph.io/t/nepali-text-not-being-saved/82/2 and the scraper https://morph.io/tmtmtmtm/nepal-ca-members .

When the SQLite DB is downloaded, the characters are there, but in the table display on the scraper page, and in the csv and json formats, they are not rendered.

Data table on morph.io

screen shot 2015-08-03 at 5 19 27 pm

SQLite DB downloaded

screen shot 2015-08-03 at 5 20 41 pm

henare commented 9 years ago

Fixed in https://github.com/tmtmtmtm/nepal-ca-members/pull/1

mlandauer commented 9 years ago

An underlying reason for the confusion is the difference in the way that encoding is handled in the console output versus the data in the database. It might be worth revisiting that.

The console just forces the encoding to be utf8, I think, and hopes for the best.

Reading data from the database I think is handled slightly differently. I think it tries to convert it to utf8 and then removes anything it can't leading to different behaviour.

I might be slightly wrong on the details (I haven't checked the code) but the main point is there are differences and that can make it confusing for the user.