kieranjol / IFIscripts

Detailed documentation is available here: http://ifiscripts.readthedocs.io/en/latest/index.html
http://ifiscripts.readthedocs.io/en/latest/index.html
MIT License
50 stars 33 forks source link

as11fixity.py - Special characters (ú, é, í etc.) and lxml parsing #90

Open ghost opened 8 years ago

ghost commented 8 years ago

Seems to be throwing up errors with Irish language titles but also seems to be a lot of queries on the topic out there so I'm sure there's a solution. Just to be aware

feirm

ghost commented 8 years ago

@AnjaMahler @kieranjol

kieranjol commented 8 years ago

You'll probably have to do something to the filename string, declaring that it's Unicode or something. Is something similar in my dcpsubs2srt script where those characters broke it. You can see what I did in there..

On 14 Sep 2016 5:33 p.m., "ecodonohoe" notifications@github.com wrote:

@AnjaMahler https://github.com/AnjaMahler @kieranjol https://github.com/kieranjol

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kieranjol/IFIscripts/issues/90#issuecomment-247073488, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEyvuUJT8IBuYoa6NojiW2UIJqDQaK3ks5qqCHggaJpZM4J89hr .

kieranjol commented 8 years ago

Actually I never pushed this back to that seperate scripts, but I have it in dcpaccess.py. https://github.com/kieranjol/IFIscripts/blob/master/dcpaccess.py#L226 This line has the fix - Looking at your terminal output, it looks like it's the append_csv function that's causing the issue, but the character issue might just pop up somewhere else when you fix that. Keep me posted on how you get on. Maybe it's best to just alter whatever variable is causing the issue, assuming it isn't a destructive process.

Here's some stack overflow stuff: http://stackoverflow.com/questions/19833440/unicodeencodeerror-ascii-codec-cant-encode-character-u-xe9-in-position-7

kieranjol commented 8 years ago

this is an interesting problem though, - and might inform the instructions we give to depositors about file names. if we created that file name ourselves we'd never put a fada and we probably wouldn't put a capitol letter. you guys know more about the specs of AS11 - does the file naming follow a particular structure?; is there room to ask for all lower case file names and no characters that aren't letters, numbers or underscores?

kieranjol commented 8 years ago

^^ I think Raelene posted the previous comment cos I was logged in on ES1^^