greenelab / word-lapse

Explore how a word changes over time
https://greenelab.github.io/word-lapse/
Other
6 stars 3 forks source link

Implements "search by tag name" in API #42

Closed falquaddoomi closed 2 years ago

falquaddoomi commented 2 years ago

This PR addresses issue #40, being able to search the models by the label associated with a concept ID and not the concept ID itself.

To allow for fast lookups over a fairly large table (~1GB, with 39,446,070 rows), the concept labels and IDs are inserted into a pygtrie CharTrie, which is then exposed as a global so that the /autocomplete endpoint can use it, much like it currently uses the vocabulary trie. Building this trie can take a long time, so the trie-loading code first looks for an existing pickled version of the trie at data_folder / concept_trie.pkl and, if it can't find it, generates and then pickles the trie to that location. On my M1 Mac Pro, generating the trie takes ~45 minutes, whereas loading it takes ~12 minutes.

Since the /autocomplete endpoint now returns two types of results, vocabulary entries and concept map entries, the returned format has been changed from a list of matching terms to a dict with the following form:

{
  'vocab' [<term:str>, ... ],
  'concept': [ [<term:str>, <concept_id:str> ], ... ]
}

This PR depends on https://github.com/greenelab/word-lapse-models/pull/6.

Closes #40.

netlify[bot] commented 2 years ago

Deploy Preview for word-lapse canceled.

Name Link
Latest commit a258b9b57afa4d5a606409280176aed6ab571d3c
Latest deploy log https://app.netlify.com/sites/word-lapse/deploys/624c5662e9a4d10008f1a49b