monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Solr field types, dynamic fields and copy fields configuration #405

Closed kevinschaper closed 1 year ago

kevinschaper commented 1 year ago

Closes #296

This is more of a short term solution than long term, but I didn't feel confident about solving this generically in linkml-solr without first making a more concrete solution.

The ultimate goal here is to give us two new tokenized fields for search & autocomplete matching against the solr index.

I brought over the field type configuration for both fields from ZFIN. I think the autocomplete field config should be pretty straightforward. The text configuration may need some tweaking, depending on tokenization edge cases. (Do we want to tokenize on word/number boundaries? which punctuation? etc)

Ultimately, it's probably low stakes until/unless we get genotype names, or find that people have very specific expectations when searching for GO or CHEBI terms with lots of punctuation.