Add support for ODT files

nchachereau commented 3 years ago

First of all, thank you for this tool and for the documentation on the Wiki pages. A few months ago, I wanted to select the Zotero items I had cited in a document I wrote. As a Linux user, I use only LibreOffice, and because saving a document as a .docx breaks the links, the Reference Extractor did not work for me. However, thanks to your documentation and looking at your code, I could write a small Python script specifically tailored for my uses (you can read it as a gist).

Back then I wrote it in Python because I was unfamiliar with a lot of the Javascript features used in your code (async and then, arrow functions…). But having learnt about these in the meantime, I thought I would try to port my code to the official Reference Extractor, as a way of thanking you and making this tool even more useful.

Take a look and tell me if you think this would be of interest.

One last thing: the Zotero plugin for LibreOffice allows the user to choose, for some CSL styles, between "Reference marks" and "Bookmarks". The former being recommended, the latter allowing a document to be shared between LibreOffice and Microsoft Word (official documentation). As it stands, the code only works with Reference marks. From the looks of it, supporting Bookmarks introduces a bit more complexity but seems feasible. I wanted to get your feedback on this as it stands, but I will try to add Bookmarks if you are open to the general idea of adding support for ODT.

rmzelle commented 3 years ago

@nchachereau, looks awesome, thanks! Do you happen to have an ODT file you can attach here with some Zotero or Mendeley citations for testing?

nchachereau commented 3 years ago

Sure, here are two documents for testing purposes. While preparing them I noticed my changes for CSL style name extraction weren't actually working at all, hence the new commit.

Let me know if things work for you.

Attached: ODT-Zotero-reference-marks.odt, ODT-Mendeley.odt

rmzelle commented 3 years ago

(Sorry for the wait! I don't have a lot of free time and it will take me a bit longer to fully review the code, hopefully within a week or so)

nchachereau commented 3 years ago

No worries, there is no rush, as far as I am concerned.

Regarding the code review: I mostly minimized changes to the existing code base. It might be possible / better to refactor the code. This might be more important if you think it would be good to add support for references in bookmarks in OpenDocument files, since this would add additional branches.

Anyhow, looking forward to your thoughts, whenever you are ready!

rmzelle commented 3 years ago

Looks great. I'm definitely in support for any refactoring to clean up the code. I'm very much an amateur programmer so I'm sure it can be improved upon.

In the code, I replaced "Word" by "OfficeOpenXML", which seems to be the official name for the .docx/.docm file format (https://en.wikipedia.org/wiki/Office_Open_XML_file_formats). More specific as Reference Extractor e.g. (currently) doesn't work with .doc files.

Before I merge this, did you test your code with an ODT file that uses a CSL note style like https://www.zotero.org/styles?q=id%3Achicago-fullnote-bibliography as well? At least for Word (Office Open XML!) documents note references are stored differently from in-text references (https://github.com/rmzelle/ref-extractor/issues/17).

nchachereau commented 3 years ago

Sorry about the delay! Very much an amateur as well…

Notes were my initial use case - just to make sure, I made an additional test file, and everything works.

Attached: ODT-Zotero-reference-marks-notes.odt

rmzelle commented 3 years ago

Perfect, thanks! I just merged the PR and announced the new feature on Twitter: https://twitter.com/rintzezelle/status/1427837338546282501

nchachereau commented 3 years ago

Thank you for creating the extractor in the first place and documenting it, and more than happy to share something I wrote to scratch my own itch!

rmzelle / ref-extractor

Add support for ODT files #38