nyphilarchive / PerformanceHistory

New York Philharmonic Performance History Metadata
Creative Commons Zero v1.0 Universal
129 stars 26 forks source link

Bad Encoding #3

Closed archy-bold closed 9 years ago

archy-bold commented 9 years ago

I've noticed that the encoding seems to have garbled the accented characters.

For example, Frédéric Chopin appears in the data as Chopin, Frédéric.

Great repository, otherwise!

mjbrodsky commented 9 years ago

Thanks! All diacritics are coming through fine for me...can you be more specific about where you're seeing this (file/line no.) and how you're viewing the data?

hamlet82 commented 9 years ago

I know my facility with Python encoding is abysmal, so the problem (once located) is almost certainly on my end.

On Apr 30, 2015, at 3:23 PM, Mitch notifications@github.com wrote:

Thanks! All diacritics are coming through fine for me...can you be more specific about where you're seeing this (file/line no.) and how you're viewing the data?

— Reply to this email directly or view it on GitHub https://github.com/nyphilarchive/PerformanceHistory/issues/3#issuecomment-97935386.

mjbrodsky commented 9 years ago

I just found an example...line 16610 of complete.xml. I'll check what's in our Solr index, since that's where this data is originating. Possible that I need to re-index some records.

archy-bold commented 9 years ago

Should have been more specific, there, didn't realise some of them were encoded ok.

There are other examples in the data. Searching for the character à is always a sure fire way of locating them.

mjbrodsky commented 9 years ago

The issue was with the migration from our database of record into the Alfresco repo. Made fix in migration config and re-migrated metadata so this is gremlin is now removed.