org-roam / org-roam-bibtex

Org Roam integration with bibliography management software
GNU General Public License v3.0
572 stars 47 forks source link

PDF-Scrapper does not recognize turkish authors in "BibTeX mode" #149

Closed j-steinbach closed 3 years ago

j-steinbach commented 3 years ago

I have a reference in my PDF

Çakıroğlu, Ü., Başıbüyük, B., Güler, M., Atabay, M., & Yılmaz Memiş, B. (2017).
Gamifying an ICT course: Influences on engagement and academic performance.
Computers in Human Behavior, 69, 98–107. https://doi.org/10.1016/j.chb.2016.12.
018.

It gets correctly detected and extracted with AnyStyle

Çakıroğlu, Ü., Başıbüyük, B., Güler, M., Atabay, M., & Yılmaz Memiş, B. (2017). Gamifying an ICT course: Influences on engagement and academic performance. Computers in Human Behavior, 69, 98–107. https://doi.org/10.1016/j.chb.2016.12. 018.

but in the "BibTeX mode" buffer the author field is empty.

@article{N/A2017gamifying,
  author = {},
  date = {2017},
  title = {Gamifying an ICT course: Influences on engagement and academic performance},
  volume = {69},
  pages = {98–107},
  url = {https://doi.org/10.1016/j.chb.2016.12.},
  doi = {10.1016/j.chb.2016.12.},
  journal = {Computers in Human Behavior}
}

I expected the author field to become either author = {Çakıroğlu} or author = {Cakiroglu}.


Neither my Emacs nor my system have problems showing those special chars.

It would also be interesting what happens with greek or russian characters (as they can also be easily "translated" into latin characters).

myshevchuk commented 3 years ago

It is an Anystyle problem, because parsing text into bibtex is done by Anystyle. The same happens on the command line with anystyle -f bib parse reference.txt, where the reference you provided is in reference.txt.

I found the offending character, it's the letter ı (LATIN SMALL LETTER DOTLESS I). I've never encountered any problems with French and German diacritics as well as with Cyrillic characters. I've just checked Greek and it seems to work too.

Moreover, Anystyle correctly parses the reference into other formats such as XML or JSON. anystyle -f xml parse reference.txt:

<?xml version="1.0" encoding="UTF-8"?>
<dataset>
  <sequence>
    <author>Çakıroğlu, Ü., Başıbüyük, B., Güler, M., Atabay, M., Yılmaz Memiş, B.</author>
    <date>(2017).</date>
    <title>Gamifying an ICT course: Influences on engagement and academic performance.</title>
    <journal>Computers in Human Behavior,</journal>
    <volume>69,</volume>
    <pages>98–107.</pages>
  </sequence>
</dataset>

It is just some glitch in its BibTeX conversion subroutines or external libraries it uses. In principle, it is possible for ORB to provide a workaround. But the issue should ideally be fixed upstream.

myshevchuk commented 3 years ago

Hi, the issue has been fixed upstream. Run [sudo] gem update to update your Ruby installation and pay attention that the namae Gem is updated to version 1.0.2. See also https://github.com/inukshuk/anystyle/issues/156

j-steinbach commented 3 years ago

Not sure if I should open a new issue (or ask in AnyStyle), but

Todor, V., & Pitică, D. (2013). The gamification of the study of electronics in dedicated elearning platforms. In Proceedings of the 36th International Spring Seminar on Electronics Technology (pp. 428–431). IEEE. Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6648287

gets turned into

@inproceedings{N/A2013,
  author = {},
  date = {2013},
  title = {The gamification of the study of electronics in dedicated elearning platforms},
  pages = {428–431},
  publisher = {IEEE},
  note = {Retrieved from},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6648287},
  booktitle = {Proceedings of the 36th International Spring Seminar on Electronics Technology}
}

This seems very similar to the original issue, as Pitică is also "uncommon".

myshevchuk commented 3 years ago

I cannot confirm this:

Screenshot 2021-01-24 at 10 56 23 Screenshot 2021-01-24 at 10 56 53

Are you sure you have upgraded the namae Gem? What is the output of gem list namae?

j-steinbach commented 3 years ago

You are absolutely correct, namae is indeed back to version 1.0.1.

The strange thing is that I know that I updated it to 1.0.2 (as I reacted with emojis and added the first offending paper to my Zettelkasten without troubles).

Is it possible for Gems to revert themselves? I find that unlikely, but maybe an update messed something up?

myshevchuk commented 3 years ago

I really don't know, I don't use Ruby for anything else. I have the version shipped with MacOS and just do sudo gem update. I remember to have played with virtual Ruby (and Python for that matter) environments some time ago, and it was always a mess. Maybe you have several versions of Ruby installed?

j-steinbach commented 3 years ago

That is possible. Or something got broken when doing a system upgrade (I use Manjaro Linux). I also suddenly have problems with Python, so I think some path variable is messed up, if that makes sense.