Closed jamesaoverton closed 7 years ago
The two legitimate choices here seem to be
Clucy seems to win a-priori, because the native Clojure implementation is in-memory only (whereas Lucene has on-disk indexing options that I think we'd want for the kind of taxonomies you describe in the issue).
Sure, let's try clucy first.
Work at https://github.com/knocean/knode/pull/48. We've got a complete full-text-search with interface at /search
(Note that you do need to run (knode.text-search/populate-index!)
the first time to populate the search index, otherwise you won't get any results back).
We need to be able to search for terms by their annotations. The two most important annotations for the first implementation are
label
andalternative term
, but we should plan to support annotations with a few paragraphs of text. It should be fast and do a good job of ordering search results. It needs to be able to handle the 2 million labels and alternative terms in the NCBI Taxonomy. Results must include the term IRI and the IRI of the matched annotation, along with match information.We should build on an existing system, not reinvent the wheel. It should be pure-JVM, but I don't have a preference.