papis / papis-zotero

Zotero compatibility layer for papis
GNU General Public License v3.0
75 stars 17 forks source link

Non-ascii characters encoding for papis-zotero import --from-sql #9

Closed stasvlasov closed 1 year ago

stasvlasov commented 5 years ago

Hi! I am testing papis-zotero import --from-sql. Everything works ok but instead of unicode characters it imports only utf-16 codes (at least for cyrillic characters). See example below:

author: "\u0422\u043E\u043B\u0441\u0442\u043E\u0439, \u041B\u0435\u0432 \u041D\u0438\
  \u043A\u043E\u043B\u0430\u0435\u0432\u0438\u0447"
author_list:
- given_name: "\u041B\u0435\u0432 \u041D\u0438\u043A\u043E\u043B\u0430\u0435\u0432\
    \u0438\u0447"
  surname: "\u0422\u043E\u043B\u0441\u0442\u043E\u0439"
created: '2017-12-10 16:38:44'
date: 1955-00-00 1955
extra: '00000'
files:
- GNDSVGX4..pdf
modified: '2017-12-10 16:38:44'
modified.client: '2018-08-02 19:52:47'
numberOfVolumes: '90'
place: "\u041C\u043E\u0441\u043A\u0432\u0430"
project:
- Russian Classics
publisher: "\u0413\u043E\u0441\u0443\u0434\u0430\u0440\u0441\u0442\u0432\u0435\u043D\
  \u043D\u043E\u0435 \u0438\u0437\u0434\u0430\u0442\u0435\u043B\u044C\u0441\u0442\u0432\
  \u043E \u0445\u0443\u0434\u043E\u0436\u0435\u0441\u0442\u0432\u0435\u043D\u043D\u043E\
  \u0439 \u043B\u0438\u0442\u0435\u0440\u0430\u0442\u0443\u0440\u044B"
ref: SQWMYIQG
series: "\u0421\u0435\u0440\u0438\u044F 1. \u041F\u0440\u043E\u0438\u0437\u0432\u0435\
  \u0434\u0435\u043D\u0438\u044F"
seriesNumber: '1'
tags: ''
title: "\u041F\u043E\u043B\u043D\u043E\u0435 \u0441\u043E\u0431\u0440\u0430\u043D\u0438\
  \u0435 \u0441\u043E\u0447\u0438\u043D\u0435\u043D\u0438\u0439. \u041D\u0435\u0441\
  \u043A\u043E\u043B\u044C\u043A\u043E \u0441\u043B\u043E\u0432 \u043F\u043E \u043F\
  \u043E\u0432\u043E\u0434\u0443 \u043A\u043D\u0438\u0433\u0438 ''\u0412\u043E\u0439\
  \u043D\u0430 \u0438 \u043C\u0438\u0440''"
type: book
volume: '16'

The above should be imported as following. (To decode I used https://convertcodes.com/utf16-encode-decode-convert-string.)

author: "Толстой, Лев Ни\
  колаевич"
author_list:
- given_name: "Лев Николаев\
    ич"
  surname: "Толстой"
created: '2017-12-10 16:38:44'
date: 1955-00-00 1955
extra: '00000'
files:
- GNDSVGX4..pdf
modified: '2017-12-10 16:38:44'
modified.client: '2018-08-02 19:52:47'
numberOfVolumes: '90'
place: "Москва"
project:
- Russian Classics
publisher: "Государствен\
  ное издательств\
  о художественно\
  й литературы"
ref: SQWMYIQG
series: "Серия 1. Произве\
  дения"
seriesNumber: '1'
tags: ''
title: "Полное собрани\
  е сочинений. Нес\
  колько слов по п\
  оводу книги ''Вой\
  на и мир''"
type: book
volume: '16'

Also see that some strange line breaks appear in the middle of words. This is what I would like to get ideally (edited manually from above):

author: "Толстой, Лев Николаевич"
author_list:
- given_name: "Лев Николаевич"
  surname: "Толстой"
created: '2017-12-10 16:38:44'
date: 1955-00-00 1955
extra: '00000'
files:
- GNDSVGX4..pdf
modified: '2017-12-10 16:38:44'
modified.client: '2018-08-02 19:52:47'
numberOfVolumes: '90'
place: "Москва"
project:
- Russian Classics
publisher: "Государственное издательство художественной литературы"
ref: SQWMYIQG
series: "Серия 1. Произведения"
seriesNumber: '1'
tags: ''
title: "Полное собрание сочинений. Несколько слов по поводу книги ''Война и мир''"
type: book
volume: '16'

Any suggestions how to fix that? (papis 0.9, papis-zotero v0.0.2, ubuntu 18)

alejandrogallo commented 5 years ago

Hi @stasvlasov I'm reviewing this for releasing of papis 0.9. I have some papers in Russian in my library.

It is important to know what you've got in your info-allow-unicode setting, can you tell me what you've got there? You can check it and compare it with this

asciinema recording

.

Пока!

stasvlasov commented 5 years ago

Hi @alejandrogallo! I have all default settings:

papis config info-allow-unicode
# True

Sorry, I could not open the last link.(

alexfikl commented 1 year ago

There are some tests with unicode strings now and everything seems to be saved / loaded correctly. Feel free to reopen if this is still an issue!