Closed jg2944 closed 8 years ago
These are two different issues.
> set PYTHONIOENCODING=UTF-8
> python -m penelope -i stardict-spanish-english-2.4.2.zip -j stardict -f es -t en -p bookeen -o output
before executing Penelope. If the latter, you need to provide the encoding of the input file with the --input-file-encoding
flag. For example:
> python -m penelope -i stardict-spanish-english-2.4.2.zip -j stardict -f es -t en -p bookeen -o output --input-file-encoding latin1
Thanks a lot, Alberto ! Btw, any idea where I can find spanish or spanish / french dictionnaries that can be used as input file of Penelope ? Thanks again Joel Le 23 févr. 2016 20:12, "Alberto Pettarin" notifications@github.com a écrit :
These are two different issues.
1.
"Unable to find vcvarsall.bat" means that pip is not able to find the MS C compiler for Python to compile one of the dependencies (lxml or marisa-trie). If you do not plan to I/O in XML or Kobo format, you can ignore the error. Otherwise, you need to download this: https://www.microsoft.com/en-us/download/details.aspx?id=44266 and run pip in the special command provided by it. 2.
It looks like your shell and/or input file is not UTF-8. If the former, you can try giving the following command:
set PYTHONIOENCODING=UTF-8 python -m penelope -i stardict-spanish-english-2.4.2.zip -j stardict -f es -t en -p bookeen -o output
before executing Penelope. If the latter, you need to provide the encoding of the input file with the --input-file-encoding flag. For example:
python -m penelope -i stardict-spanish-english-2.4.2.zip -j stardict -f es -t en -p bookeen -o output --input-file-encoding latin1
— Reply to this email directly or view it on GitHub https://github.com/pettarin/penelope/issues/16#issuecomment-187846664.
Searching for "stardict spanish dictionary" in Google returns several hits. I have no idea about their quality or copyright status.
Hey mate,
first of all I'd like to thank you for this great software. I've got the same problem of jg2944:
Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name)
... File "/usr/lib/python2.7/gzip.py", line 34, in open return GzipFile(filename, mode, compresslevel) File "/usr/lib/python2.7/gzip.py", line 136, in init self._write_gzip_header() File "/usr/lib/python2.7/gzip.py", line 181, in _write_gzip_header self.fileobj.write(fname + '\000') UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
I've tried your suggestions but I still get this output. No problem when generating a .csv file
Edit:
the command I'm giving is: python -m penelope -i /mypath/Babylon_English_French/Babylon_English_French.zip -j stardict -f fr -t en -p kobo -o frenchkobo
Did you try:
$ export PYTHONIOENCODING=UTF-8
$ python -m penelope etc. etc.
? (Note: set
is for Windows, export
for Linux/OS X. Setting PYTHONIOENCODING
will override your shell encoding, just for Python.)
From what I see in the call trace, it looks like the error generates when the gzip module attempts to write a file to disk, probably containing some non-ASCII characters in its name, and that may be the case if you are running in a console with a non-UTF-8 encoding.
BTW, to find out what encoding Python is currently using, you can:
$ python
>>> import sys
>>> sys.stdin.encoding
'UTF-8' (or something else)
>>> sys.stdout.encoding
'UTF-8' (or something else)
Thank you for your super fast reply :)
You're right, I'm on Linux, I did the export but no news.. Yes, it's just the last step that doesn't work mate :/
The python output is:
[quote] >>> import sys
sys.stdin.encoding 'UTF-8' sys.stdout.encoding 'UTF-8'
[/quote]
I've tried also the option --input-file-encoding latin and --input-file-encoding ascii without success.. Should I edit the script perhaps?
If I cannot reproduce the issue, I could not say what is going wrong.
Can you mail me the input dictionary (or a link to Dropbox/Drive/Box to it)?
You're right, here it go: DELETED LINK
@dan3000 thank you. I deleted your link, as I am not sure the dictionary is 100% copyright free.
Nevertheless, on my laptop I do not get your error:
$ python -m penelope -i bef.zip -j stardict -f en -t fr -p kobo -o dicthtml-en-fr.zip
[INFO] Reading input file(s)...
[INFO] Reading input file(s)... done
[INFO] Writing output file(s)...
[INFO] Writing output file(s)... done
[INFO] The following file(s) have been created:
[INFO] dicthtml-en-fr.zip
Please send me an email, I will send you the file for you to test on your Kobo.
Closing this issue, to avoid polluting it. Feel free to open another issue.
Hello, I got the "Unable to find vcvarsall.bat" error mesage during "pip install penelope" under Windows 7 ;
nevertheless, the "pip list" command shows penelope (3.1.2.0) in the list ;
Is it serious, doctor ? ;-) or can this error message produced during the Penelope installation be ignored ?
thanks in advance
I have done a quick test to convert an ES-to-EN stardict dictionnary into a Bookeen format and got some errors (see below) : just wanted to know if the previous installation error message could be related to the errors shown below
C:\dictio\python -m penelope -i stardict-spanish-english-2.4.2.zip -j stardict -f es -t en -p bookeen -o output
[INFO] Reading input file(s)... [INFO] Reading input file(s)... done [INFO] Writing output file(s)... Traceback (most recent call last): File "C:\PYTHON27\lib\runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "C:\PYTHON27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "C:\PYTHON27\lib\site-packages\penelopemain.py", line 146, in
main()
File "C:\PYTHON27\lib\site-packages\penelopemain.py", line 133, in main
output_paths = write_dictionary(dictionary, arguments)
File "C:\PYTHON27\lib\site-packages\penelope\dictionary.py", line 103, in write_dictionary
return penelope.format_bookeen.write(dictionary, args, args.output_file)
File "C:\PYTHON27\lib\site-packages\penelope\format_bookeen.py", line 227, in write
sql_cursor.execute("insert into T_DictIndex values (?,?,?,?,?)", sql_tuple)
File "C:\PYTHON27\lib\site-packages\penelope\collation_default.py", line 28, in collate_function
b2 = string2.encode("utf-8").lower()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12: ordinal not in range(128)
thanks in advance