odkr / pandoc-zotxt.lua

Pandoc filter that looks up bibliographic data for citations in Zotero.
MIT License
48 stars 2 forks source link

zotxt response not encoded in UTF-8. #12

Open aaronkhq opened 4 months ago

aaronkhq commented 4 months ago

hi everyone,

when I use Zotxt to covert latex file (.tex) to word (.docx), the warning information shows:

pandoc-zotxt.lua: acemogluE2022: zotxt response not encoded in UTF-8.

pandoc-zotxt.lua: schumpeter1942: zotxt response not encoded in UTF-8.

pandoc-zotxt.lua: graetzRES2018: zotxt response not encoded in UTF-8.

I try to find a solution. However, I lost here. Did anyone meet the same problem? Thanks!

odkr commented 4 months ago

Hi 👋!

I’m afraid I need more information to help. Can you post the command line that you use to invoke Pandoc and a CSL JSON or CSL YAML export of the Zotero items that give you trouble? Does this happen for every item? Or only some? What system do you use?

I didn't think that this error would be possible. All it says is that zotxt did not declare its response to your query as being encoded in UTF-8 (pandoc-zotxt.lua has no way of detecting encodings). Handling encodings other than ASCII and UTF-8 is tricky with Lua, so pandoc-zotxt.lua doesn't even try and aborts. Do you know what encoding Zotero uses on your system?

You can try to remove the encoding check. But test it with a file that you don’t mind losing or that you have another copy of. The most likely outcome is just another type of failure.

If you still want to try, search the function connectors.Zotxt.fetch and look for this segment:

if not mt or mt == '' or not str or str == '' then
    if t == 'betterbibtexkey'
        then err = ckey .. ': no matches.'
        else err = ckey .. ': zotxt response is empty.'
    end
elseif not mt:match ';%s*charset="?utf%-?8"?%s*$' then
    err = ckey .. ': zotxt response not encoded in UTF-8.'
else

The check for UTF-8 are the two lines:

elseif not mt:match ';%s*charset="?utf%-?8"?%s*$' then
    err = ckey .. ': zotxt response not encoded in UTF-8.'

Remove them. The segment should now read:

if not mt or mt == '' or not str or str == '' then
    if t == 'betterbibtexkey'
        then err = ckey .. ': no matches.'
        else err = ckey .. ': zotxt response is empty.'
    end
else

Let me know how that went!

aaronkhq commented 4 months ago

Thank you for your detailed reply!

I use the following command on MacOS Sonoma 14.1.1 using Terminal:

pandoc -L pandoc-zotxt.lua -C -s main.tex -o output.docx

The warning information shows for every item.

Following your suggestion, I've tried to remove the encoding check in pandoc-zotxt.lua. Now the warning information disappers!! Thanks!!

However, the new problem arise 😭. The zotxt hasn't support Zotero 7 now. So, I can't use Zotxt to convert my latex file to word.

Any other solution is appreciated!! I use latex to do some academic writing, with lots of literature citing (I use Zotero's add-on called Better Bibtex to manage my literature). However, some journals don't accept latex file so I have to convert it to word (.docx). During the converting process, is there any good solution to help me automatically insert Zotero references in macro form (so that I can easily change the reference format in Microsoft Word)?

Thank you!!

odkr commented 4 months ago

I still don’t quite get what you are trying to do. Why do you need Zotero do convert a LaTeX file to a Word file? I presume, you use Zotero and BetterBibTeX to manage your bibliographies? If so, you can 'cut out' Zotero of the process. You can use BetterBibTeX to export your Zotero database as a whole to one (huge) CSL JSON file.

Supposing that this file is called bib.json you should then be able to convert the file with:

pandoc -M bibliography=bib.json -C -s main.tex -o output.docx

I'm still confused why zotxt doesn't return items as encoded in UTF-8 though.

aaronkhq commented 4 months ago

One of the biggest resons is that --citeproc of pandoc can not support CSL-M which is ued in my reference CSL file to use more than 2 language (see: https://github.com/jgm/citeproc/issues/120).

And, sorry for that I can not give more information on why zotxt doesn't return items as encoded in UTF-8, although I am also very confused.

Thanks!!

odkr commented 4 months ago

Oh, I see. The last time I checked, Zotero 7 could safely be downgraded to Zotero 6 (but check that again and make a copy of your database). I’m afraid using Zotero 6 is your only option. That may also address the UTF-8-issue. I’m suprised you got as far as you did with Zotero 7.

aaronkhq commented 4 months ago

Thanks a lot!!