jbaiter / zotero-cli

Command-line interface for Zotero
MIT License
274 stars 41 forks source link

Searching in PDF text body and abstract #20

Closed joelostblom closed 7 years ago

joelostblom commented 7 years ago

Would it be possible to search not only the title and authors with zotcli query, but also the abstract and possibly even the PDF body?

I know Zotero has a field for the abstract, but I am not sure if this is easily queried via its database. For the PDF search, the option should only be available if the user has enabled PDF indexing within Zotero.

If these searches would cause significant slowdowns, maybe they could be added via an optional flag?

jbaiter commented 7 years ago

Abstract shouldn't be a problem, this would just be another field in the local search index. PDF indexing could get pretty gnarly, since we would have to hook into the index that Zotero keeps of the PDF body text (lest we want to duplicate it, which is probably not a good idea). Do you happen to know what format that index is kept in?

joelostblom commented 7 years ago

Thanks for getting back on all the issues and pull requests!

Unfortunately, I don't know what format the full text index is in. However, I think adding the capability to search the abstract is definitely the most important of these two suggestions. If full text search requires a lot of work, it might not be worth adding it at all.

I was trying to implement the abstract searching myself, but I am not very familiar with databases so I could not find where it is stored in my local zotero data base. I got this reply over at the zotero forums which might be helpful:

... the fieldID for abstractNote is 90. Every abstract for every item is then assigned a valueID in itemData and that value ID is mapped to a value in ItemDataValues.

joelostblom commented 7 years ago

Ignore what I said regarding the zotero sqlite database, I misunderstood how zot-cli works. I thought it was querying the zotero sqlite databse, but it is actually storing information in its own database (the local search index, as you pointed out), and interfacing with zotero through pyzotero.

I added abstract searching on my local branch, and it seems to be working fine. I also have not noticed any speed differences. I will test a bit more before making a pull request.

jbaiter commented 7 years ago

Yes, I didn't want to tie the application to Zotero's SQLite schema, since this might subtly change between releases. The HTTP API is clearly documented and officially supported for third-party applications, so zotero-cli sticks to that, at the expense of some double book-keeping.