jacksongoode / NIME-proceedings-analyzer

A tool for the bibliographic analysis of the NIME proceedings archive
GNU General Public License v3.0
7 stars 3 forks source link

Integrate download links from 2021 onwards (PubPub) #1

Closed jacksongoode closed 2 years ago

jacksongoode commented 2 years ago

Now that NIME has moved to PubPub we need to parse the source of the PubPub urls to find documents for the papers. However, this may also be an opportunity in that XML files are already provided by PubPub. Thus it might be possible to skip Grobid for these new papers.

stefanofasciani commented 2 years ago

XML files provided by PubPub are likely to provide more accurate data and less errors. However, the code of the proceedings-analyzer code may have to be updated/fixed every time PubPub will change something in their XML files (it may happen frequently since PubPub is still a pretty new platform). Perhaps sticking to PDF files can provide a longer longevity/compatibility to the proceedings-analyzer.

jacksongoode commented 2 years ago

Good point. One oddity is that the PDFs generated by PubPub may not be well formed all the time - in fact our paper 20 NIMES is malformed and the PDF parser from pdfminer (a good one) isn't able to accept it. I'm thinking that it might be possible to attempt to fix the PDF with a library like pikepdf (that uses qpdf).

stefanofasciani commented 2 years ago

pikepdf can be a good (temporary) workaround.