Implement text search - Githubissues

jamesaoverton commented 7 years ago

We need to be able to search for terms by their annotations. The two most important annotations for the first implementation are label and alternative term, but we should plan to support annotations with a few paragraphs of text. It should be fast and do a good job of ordering search results. It needs to be able to handle the 2 million labels and alternative terms in the NCBI Taxonomy. Results must include the term IRI and the IRI of the matched annotation, along with match information.

We should build on an existing system, not reinvent the wheel. It should be pure-JVM, but I don't have a preference.

inaimathi commented 7 years ago

The two legitimate choices here seem to be

Caponia (a Clojure-native text search engine)
clucy (a Clojure-based wrapper around the Java Lucene engine)

Clucy seems to win a-priori, because the native Clojure implementation is in-memory only (whereas Lucene has on-disk indexing options that I think we'd want for the kind of taxonomies you describe in the issue).

jamesaoverton commented 7 years ago

Sure, let's try clucy first.

inaimathi commented 7 years ago

Work at https://github.com/knocean/knode/pull/48. We've got a complete full-text-search with interface at /search (Note that you do need to run (knode.text-search/populate-index!) the first time to populate the search index, otherwise you won't get any results back).

knocean / knode

Implement text search #44