whiskyechobravo / kerko

A web application component that provides a faceted search interface for bibliographies managed with Zotero.
https://whiskyechobravo.github.io/kerko/
GNU General Public License v3.0
311 stars 36 forks source link

Database implementation #10

Closed pankus closed 1 year ago

pankus commented 1 year ago

First of all, thank you for this amazing piece of software.... this is not an issue but just a question. If I well understand, Kerko does not need a database instance to collect Zotero data. Yet, sometimes it could be useful. Is there any parameter to say Kerko to store (and update) a Zotero citation key in a database model?

thank you in advance

davidlesieur commented 1 year ago

Kerko has its own database that's optimized for search, but it is designed to source all its data from Zotero. It contains no additional user data. Thus, the expected place for citation keys would be Zotero itself. For example, the Better BibTeX extension for Zotero can save citation keys in Zotero's extra field, which Kerko can display and search.

I hope this helps!

pankus commented 1 year ago

Thank you very much for the answer. I'll try to explain myself better (sorry, I don't have a programming background, and some things might be trivial). I would like to use Kerko in the context of a flask-admin application. In this application, a user must specify a bibliographic reference from an existing Zotero database whenever he adds a new record. Now, since flask-admin allows you to create a form based on an existing model exclusively, I would like to intercept Kerko records and populate one of the fields in the model managed by flask-admin. Kerko, in this context, is very useful because it allows me to keep the list of bibliographic references up-to-date at all times. So the question about Kerko's DB should be better phrased: where does Kerko store the data downloaded from Zotero?

davidlesieur commented 1 year ago

Oh, I see. Kerko uses the Whoosh search engine, thus its data is stored in Whoosh's format, which can be thought of as a NoSQL document database. Kerko actually has two databases, the first one is the 'cache', which stores items in a format that is very close to that returned by the Zotero API. The second one is the 'index', which Kerko builds from the cache, and where items are restructured for search and faceting purposes. Either of these databases can be accessed using the Whoosh API.

However, I did not really envision Kerko to be used in the way you are suggesting. Was I shortsighted? Maybe! But some consequences are that Kerko's own database querying functions are tailored to its very specific needs only, and that future database structure changes could potentially break your app.

Could Kerko's design be changed to accommodate this new usage? I'm not sure, because ultimately it would lead to rebuilding the Zotero API, only with a different backend. That does not sound very productive. Have you considered using the Zotero API directly? The PyZotero library can help a lot.

pankus commented 1 year ago

Thank you very much, David. Now everything is much clearer.

edit From what I can see, the key to providing Kerko with the ability to interact with a db other than 'cache' and 'index' is in cache.py, precisely in the sync_cache() function. When pyzotero retrieves Zotero data (https://github.com/whiskyechobravo/kerko/blob/6d50117465082553951af258faf3e7f87f033a65/src/kerko/sync/cache.py#L62) each item can be stored in a db, possibly in JSON format, and eventually managed as needed. This process is trivially described and should be generalized; however, it works.

Of course, this does not detract from David's extraordinary work. Thank you.