retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.19k stars 284 forks source link

`unexpected '\65533'` error, referring to the � character from citation key generation of Arabic text #2823

Closed majdal closed 5 months ago

majdal commented 5 months ago

Debug log ID

JH35ULCH-refs-euc/6.7.172-6

What happened?

I am trying to use the bibliography generated by BBT in pandoc with citeproc.

When my library includes items with Arabic-language title or author, I get the error:

Error reading bibliography file /Users/me/PhD/zotero_library_2.bib:
(line 19, column 14):
unexpected '\65533'

Where line 19 is the item in the debug log. It looks like this:

@article{aabd�lkrymMZ�hrMn�ltnZym1401,
  title = {مظاهر من التنظيم الحرفي في بلاد الشام: في العهد العثماني},
  shorttitle = {مظاهر من التنظيم الحرفي في بلاد الشام},
  author = {عبدالكريم, رافق ،},
  date = {1401},
  journaltitle = {دراسات تاريخية : مجلة علمية فصلية محكمة . -},
  keywords = {الحرفيون,العصر العثماني,بلاد الشام,تاريخ}
}

It seems like the problem is with the character, which has been referred to in https://github.com/retorquere/zotero-better-bibtex/issues/895 and https://github.com/retorquere/zotero-better-bibtex/issues/2413. Manually removing the character fixes the issue.

Many thanks for the great plugin!

retorquere commented 5 months ago

With the settings in JH35ULCH-refs-euc/6.7.172-6 I don't get that citation key, I get aabdlkrymMZhrMnltnZym1401.

retorquere commented 5 months ago

\65533 is \uFFFD, the unicode replacement charachter, but BBT unconditionally removes those, so I don't know how I would replicate this. I'm also on a mac (albeit on Sonoma), so it's unlikely to be a platform issue.

majdal commented 5 months ago

Strange. I updated to the latest version of BBT and when I refresh the citation key, the � characters are removed. I generated the keys originally back in December, so not so long ago. I don't know what caused the original keys to include these characters.

I guess I'll refresh all citation keys and that should remove the error character. I guess the bug is still there, but more complicated to narrow down, so I'm not sure if it is still relevant.

Thank you for the quick response!

retorquere commented 5 months ago

I fixed the bug that let \uFFFD through 2023-02-13. Ever since that time it should be impossible to generate new keys that have that character

I guess the bug is still there

I don't see how you would come to that conclusion, given that a refresh removed them.

The only scenario I can think of is that you might have had entries that were last changed before the fix date; BBT does not refresh keys on upgrades, so \uFFFDs generated before that fix date would remain in place until refreshed. The cached keys in JH35ULCH-refs-euc/6.7.172-6 do still have them, they are just not regenerated, which is why I couldn't recreate the problem, and the item in that log was last changed on 2020-12-01. So that would fit.

majdal commented 5 months ago

If I recall correctly, I generated the keys in early January, that's why I think the bug might be there. But maybe my memory is wrong. I think we can safely close the ticket. Thanks for the help!

retorquere commented 5 months ago

If I recall correctly, I generated the keys in early January,

Not for the item you sent a log for. That last changed on 2020-12-01.

that's why I think the bug might be there.

And I think there's no evidence the bug is still there. If you still think the bug is present, and you can provide a reason for thinking that, I'd want to fix the bug.

Can you select all items and send a debug log?