whiskyechobravo / kerko

A web application component that provides a faceted search interface for bibliographies managed with Zotero.
https://whiskyechobravo.github.io/kerko/
GNU General Public License v3.0
302 stars 36 forks source link

Exclude attachments? #25

Closed remillc closed 7 months ago

remillc commented 7 months ago

We'd like to prevent attachments (pdfs) from being present into a Kerko instance. That means:

  1. Do not download and store attachments from a Zotero library;
  2. Do not index those, consequently do not make them searchable;
  3. Also do not display any download links in the search results and notice, that is the Read document button and the Document field in the notice page

Download and store attachments

I didn't find any fetch/index related config to prevent from the attachments being exported from Zotero. Right now, the PDFs are stored in the instance/kerko/attachments folder and thus can be fetched if one knows how to forge the corresponding url.

Index attachments content

From my understanding, point (2) can be done through the config

[kerko.search_fields.core.optional.documents]
enabled = false

Display links

I could also not find any config to prevent the attachments links from being displayed into the search results and notice page. If we could tell Kerko not to download the attachments from Zotero, maybe such a config would be redundant.

So am I missing something, or is there an opportunity for improvement?

davidlesieur commented 7 months ago

I suggest the following settings:

kerko.zotero.child_include_re = "^_publish$"  
kerko.zotero.child_exclude_re = ""  
kerko.search.fulltext = false
kerko.scopes.fulltext.enabled = false
kerko.scopes.metadata.enabled = false

The first two lines will exclude all attachments and child notes, except for those that were assigned the tag _publish in Zotero. Thus, attachments will always be excluded by default, but if there are exceptional cases where you still want an attachment to be made available through Kerko, you will be able to by tagging it.

Excluded attachments will no longer be downloaded by Kerko when synchronizing from Zotero, and there will be no buttons or links to the documents.

The 3rd line will make Kerko not extract the full-text of PDFs, and thus the full-text won't be searchable (even for those exceptional PDFs that may still be included).

The 4th and 5th lines will remove options "In documents" and "In all fields" from the search interface. With full-text indexing disabled, "In documents" would have nothing to search, and "In all fields" would do the same as the "Everywhere" option.

With the above settings, it is redundant to set kerko.search_fields.core.optional.documents.enabled = false, but it wouldn't hurt to configure it as well.

If you re-sync your library after changing the configuration as above, your 3 issues should be solved.

Hmm... Something on this whole topic could be a nice addition to the Configuration guides!

remillc commented 7 months ago

Thanks for the settings, it worked!

Yep, it would be a nice addition to the config guides.