Closed crashracer closed 4 years ago
Hi @crashracer,
can you provide full traceback with added --debug
to your job download
command ?
The part responsible for downloading the files is: https://github.com/unofficial-memsource/memsource-cli-client/blob/d0c1c4741c92743c296b8238df2714266ba28000/memsource_cli/api_client.py#L531-L540
My suspicion is towards the encoding, maybe trying to use utf-8
encoding, maybe try adding encoding='utf-8'
to open()
if content_disposition:
filename = re.search(r'filename\*=[\w\-]+[\']+([^\'"\s]+);',
content_disposition).group(1)
path = os.path.join(os.path.dirname(path), filename)
try:
with open(path, "wb", encoding="utf-8") as f:
f.write(response.data)
except:
with open(path, "w", encoding="utf-8") as f:
f.write(response.data)
This is likely not a solution, as we should avoid hardcoding the encoding to the code, however we can at least verify that is the case.
traceback
`'charmap' codec can't encode characters in position 775-778: character maps to
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python3.6sci\Lib\site-packages\cliff\app.py", line 401, in run_subcommand
result = cmd.run(parsed_args)
File "C:\Python3.6sci\Lib\site-packages\cliff\display.py", line 116, in run
column_names, data = self.take_action(parsed_args)
File "C:\Python3.6sci\Lib\site-packages\memsource_cli\job\v1\job.py", line 272, in take_action
format=parsed_args.bilingual_format)
File "C:\Python3.6sci\Lib\site-packages\memsource_cli\api\job_api.py", line 1159, in get_bilingual_file
(data) = self.get_bilingual_file_with_http_info(project_uid, **kwargs) # noqa: E501
File "C:\Python3.6sci\Lib\site-packages\memsource_cli\api\job_api.py", line 1245, in get_bilingual_file_with_http_info
collection_formats=collection_formats)
File "C:\Python3.6sci\Lib\site-packages\memsource_cli\api_client.py", line 330, in call_api
_preload_content, _request_timeout)
File "C:\Python3.6sci\Lib\site-packages\memsource_cli\api_client.py", line 169, in call_api
return_data = self.deserialize(response_data, response_type)
File "C:\Python3.6sci\Lib\site-packages\memsource_cli\api_client.py", line 233, in deserialize
return self.deserialize_file(response)
File "C:\Python3.6sci\Lib\site-packages\memsource_cli\api_client.py", line 540, in __deserialize_file
f.write(response.data)
File "C:\Python3.6sci\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 775-778: character maps to
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python3.6sci\Scripts\memsource-script.py", line 11, in
Hi @crashracer
Thank you for the traceback!
Here is the relevant error:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 775-778: character maps to `
I wonder what character breaks it, if it's ` then I have tried reproducing it, but couldn't.
It might be as well related only to Windows because of the way how Windows handles it C:\Python3.6sci\Lib\encodings\cp1252.py
Could you change in memsource-cli-client/memsource_cli/api_client.py
Line 535-540
try:
with open(path, "wb") as f:
f.write(response.data)
except:
with open(path, "w") as f:
f.write(response.data)
To
try:
with open(path, "wb", encoding="utf-8") as f:
f.write(response.data)
except:
with open(path, "w", encoding="utf-8") as f:
f.write(response.data)
Here is my try for reproducer:
$ cat testfile.md
Foo`Bar
$ memsource project list
+------------------------+-------------+----------+---------------------+--------------------------+--------+------------+---------------------------+-------------+--------------+------------+-----------+
| uid | internal_id | id | name | date_created | domain | sub_domain | owner | source_lang | target_langs | references | user_role |
+------------------------+-------------+----------+---------------------+--------------------------+--------+------------+---------------------------+-------------+--------------+------------+-----------+
| xjwi5RW1EEgKja5HGsZdD0 | 255 | 15274680 | Memsource Project 1 | 2019-11-06 | None | None | {"first_name": "Robin", | en | ['ja'] | [] | ADMIN |
| | | | | 11:17:03+00:00 | | | "last_name": "Cernin", | | | | |
| | | | | | | | "user_name": | | | | |
| | | | | | | | "robincernin", "email": " | | | | |
| | | | | | | | r9n.developer@gmail.com", | | | | |
| | | | | | | | "role": "ADMIN", "id": | | | | |
| | | | | | | | "380294", "uid": | | | | |
| | | | | | | | "i0joEXVYjvh6821clw6Qm5"} | | | | |
+------------------------+-------------+----------+---------------------+--------------------------+--------+------------+---------------------------+-------------+--------------+------------+-----------+
$ memsource job create --file testfile.md --project-id xjwi5RW1EEgKja5HGsZdD0 --target-langs ja
+------------------------+--------+--------------------------+-------------+--------------+
| id | status | date_created | filename | target_langs |
+------------------------+--------+--------------------------+-------------+--------------+
| 2Q5to3F8DNpQ12rdmQbGD5 | NEW | 2019-11-27T01:22:32+0000 | testfile.md | ja |
+------------------------+--------+--------------------------+-------------+--------------+
$ memsource job download --type bilingual --project-id xjwi5RW1EEgKja5HGsZdD0 --job-id 2Q5to3F8DNpQ12rdmQbGD5 --bilingual-format XLIFF
+--------+-----------------------------------------------------------------+
| Field | Value |
+--------+-----------------------------------------------------------------+
| type | bilingual |
| format | XLIFF |
| path | /var/home/rcernin/git/memsource-cli-client/testfile-en-ja-T.xlf |
+--------+-----------------------------------------------------------------+
$ cat /var/home/rcernin/git/memsource-cli-client/testfile-en-ja-T.xlf
<?xml version='1.0' encoding='UTF-8'?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" xmlns:slr="urn:oasis:names:tc:xliff:sizerestriction:2.0" xmlns:memsource="http://www.memsource.com/xliff2.0/1.0" version="2.0" memsource:wfLevel="1" srcLang="en" trgLang="ja">
<file id="O9qvSAkcSBjSfTtn_dc4:0-0" memsource:taskId="O9qvSAkcSBjSfTtn_dc4" canResegment="no" original="testfile.md">
<slr:profiles generalProfile="xliff:codepoints"/>
<unit id="0" memsource:tGroupBegin="0" memsource:tGroupEnd="0">
<segment id="0" state="initial">
<source>Foo`Bar</source>
<target></target>
</segment>
</unit>
</file>
</xliff>
This traceback is thrown from Windows Python 3.6sci library
File "C:\Python3.6sci\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 775-778: character maps to `
@ludekjanda We need to figure out a way this works on Windows.
@crashracer I don't think you would have the same problem on Linux or MacOS. Would it be possible for you to try the same reproducer, I have sent in https://github.com/unofficial-memsource/memsource-cli-client/issues/15#issuecomment-558887873 ? Also I could have try to reproduce the same on Windows, but will need to know where you have got the Python3.6sci from and how is it installed in Windows.
@crashracer
Couldn't reproduce with Windows 10 and Python 3.6.0 from https://www.python.org/downloads/release/python-360/
I am thinking if the issue could be within the python or the way how the memsource-cli was installed?
For installation in Windows, I have done:
python -m pip install memsource-cli
Then changed my directory to C:\Users\admin\AppData\Local\Programs\Python\Python36\Scripts
and run the following:
C:\Users\admin\AppData\Local\Programs\Python\Python36\Scripts>memsource.exe job list --project-id xjwi5RW1EEgKja5HGsZdD0 -f value -c uid
OwMWJLd5Pm9V7BdVYgqTf1
2Q5to3F8DNpQ12rdmQbGD5
C:\Users\admin\AppData\Local\Programs\Python\Python36\Scripts>memsource.exe job download --type bilingual --project-id xjwi5RW1EEgKja5HGsZdD0 --job-id 2Q5to3F8DNpQ12rdmQbGD5 --bilingual-format XLIFF
+--------+-----------------------------------------------------------------+
| Field | Value |
+--------+-----------------------------------------------------------------+
| type | bilingual |
| format | XLIFF |
| path | C:\Users\admin\AppData\Local\Programs\Python\Python36\Scripts\t |
| | estfile-en-ja-T.xlf |
+--------+-----------------------------------------------------------------+
<?xml version='1.0' encoding='UTF-8'?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" xmlns:slr="urn:oasis:names:tc:xliff:sizerestriction:2.0" xmlns:memsource="http://www.memsource.com/xliff2.0/1.0" version="2.0" memsource:wfLevel="1" srcLang="en" trgLang="ja">
<file id="O9qvSAkcSBjSfTtn_dc4:0-0" memsource:taskId="O9qvSAkcSBjSfTtn_dc4" canResegment="no" original="testfile.md">
<slr:profiles generalProfile="xliff:codepoints"/>
<unit id="0" memsource:tGroupBegin="0" memsource:tGroupEnd="0">
<segment id="0" state="final">
<source>Foo`Bar</source>
<target>Foo`Bar</target>
</segment>
</unit>
</file>
</xliff>
I think there is a character in the file that is breaking it, my guess was ` however I was unable to reproduce with that character even on Windows. But I am using different Python Libs which may also cause the problem.
To find out what is going on, I will need your help.
I still can't find the abusing character, I have just run test file with the whole Window 1252 character set from https://en.wikipedia.org/wiki/Windows-1252
Worked well on: Fedora 31, Python 3.7.4 Windows 10, Python 3.6.0
This fixed it:
try:
with open(path, "wb", encoding="utf-8") as f:
f.write(response.data)
except:
with open(path, "w", encoding="utf-8") as f:
f.write(response.data)
Thank you very much!
@crashracer Added into version 0.3.1 once released ~10 days. Until then available only in Github https://github.com/unofficial-memsource/memsource-cli-client/commit/449c79bc612363a14237ef5cfdd6f0c7587c4602
Running the bilingual download command generates a 'charmap' error:
$ memsource job download --type bilingual --project-id v013uZqRPSF1aFNPpWilvx --job-id VRLp9v1BAVgZgaUngvoHW1 --bilingual-format XLIFF
'charmap' codec can't encode characters in position 775-778: character maps toUsing Python3.6
ENV has: PYTHONIOENCODING=UTF8 chcp 65001