ilyachch / md_docs-trans-app

Application for translation documentation in MD format
MIT License
50 stars 14 forks source link

Issue with emoji #43

Open senhan07 opened 1 year ago

senhan07 commented 1 year ago

Its not working when there is a emoji inside markdown files, its should be able to skip/ignore that character

PS C:\Users\Lenovo\Desktop\stable-diffusion-wiki> md-translate C:\Users\Lenovo\Desktop\stable-diffusion-wiki\language\en\berkontribusi.md -F id -T en -P deepl -D  -v
ERROR:md_translate.application:Error processing file: C:\Users\Lenovo\Desktop\stable-diffusion-wiki\language\en\berkontribusi.md
ERROR:md_translate.application:'charmap' codec can't decode byte 0x9d in position 61: character maps to <undefined>
Traceback (most recent call last):
  File "C:\Users\Lenovo\.local\pipx\venvs\md-translate\lib\site-packages\md_translate\application.py", line 106, in process_file
    document = MarkdownDocument.from_file(
  File "C:\Users\Lenovo\.local\pipx\venvs\md-translate\lib\site-packages\md_translate\document\document.py", line 102, in from_file
    file_content = target_file.read_text()
  File "C:\Program Files\Python310\lib\pathlib.py", line 1135, in read_text
    return f.read()
  File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 61: character maps to <undefined>
ilyachch commented 1 year ago

please, attach the piece of the translating document, which is causing problem

senhan07 commented 1 year ago

this is the file berkontribusi.md

ilyachch commented 1 year ago

Thank you. I'll try to solve it. The obvious solution is to use emoji lib, which has func demojize, that is converting 🤝 to :handshake:, but there appears new problem - if target language is not English, translating services can translate also :handshake: to :handschlag: (in case of German lang, for example). So, I have to think, how to make it work robustly.

Anyway, for now, I highly recommend you to remove emojis, translate document, and put emojis back.