unt-libraries / catalog-api

A Django project that lets you build and expose a REST API for your library's Sierra ILS-based catalog
BSD 3-Clause "New" or "Revised" License
17 stars 8 forks source link

Rewrite MARC export process to remove SolrMarc dependency #2

Closed jthomale closed 9 months ago

jthomale commented 8 years ago

The current MARC export process goes like this: Sierra models --> pymarc --> raw MARC21 file --> SolrMarc --> Solr. Instead, we could simply go: Sierra models --> pymarc --> Solr (using pysolr), or even Sierra models --> Solr. This would involve implementing the transformations that are defined now in SolrMarc in python code, but I don't think it would be too difficult. It would have several benefits:

  1. I suspect that it would make the bib record indexing process much faster, as it wouldn't have to save files to disk and then call an external Java program to load them into Solr.
  2. It would give us more flexibility in how the MARC data is translated to Solr. Yeah we can create custom indexing methods in SolrMarc, but none of us are Java coders, so that's a barrier. And the index.properties files are limited in terms of what they can do.
  3. We can more easily insert non-MARC-derived data into the Solr index; currently, we have to put things like the III record number, material type codes, and even a list of attached item record IDs into 9XX fields in the MARC record (using pymarc) before they get saved to disk and loaded via SolrMarc. Although this works, we sometimes bump up against MARC record length limits when dealing with bib records for serials that have hundreds of items attached.
  4. Simplicity. Gets rid of a dependency.
jthomale commented 9 months ago

This is resolved now.

Later commits scrub mention of SolrMarc from settings and from the README.