perkl / sv-sv

Swedish-swedish dictionary for Kindle
16 stars 4 forks source link

Unicode Issue with the transform file. #3

Open weironiottan opened 2 years ago

weironiottan commented 2 years ago

Hey I got this error when I tried to run your script:

python transform.py > svsv.html Traceback (most recent call last): File "transform.py", line 47, in <module> if(definition != None):sys.stdout.write(definition.text) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128) make: *** [svsv.html] Error 1

The I did the following fix which seems to have fixed the issue:

` import xml.etree.ElementTree as ET import sys

tree = ET.parse('lexin_utf8.xml') root = tree.getroot()

sys.stdout.write("""<?xml version="1.0" encoding="utf-8"?>

""") for lemma in root.iter('lemma-entry'): form = lemma.find('form') pronunciation = lemma.find('pronunciation') inflection = lemma.find('inflection') pos = lemma.find('pos') sys.stdout.write("") sys.stdout.write("") sys.stdout.write((""+form.text.replace('~','')+" ").encode('utf-8')) if(inflection != None and inflection.text != None and len(inflection.text)!=0): sys.stdout.write("") for s in inflection.text.split(' '): sys.stdout.write("") sys.stdout.write("") lexemes = lemma.findall('lexeme') makelist = len(lexemes)>1 if(makelist): sys.stdout.write("
    ") for lexeme in lexemes: lexnr = lexeme.find('lexnr') definition = lexeme.find('definition') usage = lexeme.find('usage') comment = lexeme.find('comment') valency = lexeme.find('valency') grammat_comm = lexeme.find('grammat_comm') definition_comm = lexeme.find('definition_comm') examples = lexeme.findall('example') idioms = lexeme.findall('idiom') compounds = lexeme.findall('compound') if(makelist): sys.stdout.write("
  1. ") if(definition != None):sys.stdout.write((definition.text).encode('utf-8')) if(makelist): sys.stdout.write("
  2. ") if(makelist): sys.stdout.write("
") else: sys.stdout.write("
") sys.stdout.write("
") sys.stdout.write("
") sys.stdout.write(""" """) sys.stdout.write("\n") ` Tried to push a PR but your repo does not allow that access, Let me know if that was helpful!
perkl commented 2 years ago

Hi! I think the problem might be a consequence of running the script with Python 2. It was tested on Python 3. If you have Python 3 available, could you please try to run it with Python 3? Any suggestions on how to make it work on both python 2 and 3 are welcome.

I don't know why you weren't able to create a pull request. It has worked before. If I need to enable some setting in the repo to make it work, I will be happy to do it.

Thanks for your interest in this project!