Radix Trees to replace Entity and Question Autosuggest

One big pain point in adding new data to the ODN is updating the autosuggest datasets. The entities autosuggest is somewhat maintainable, but the questions autosuggest is a real mess. It uses a variety of hacks to get Socrata autosuggest to do things that it simply isn't meant to do.

I decided to experiment with using in-memory radix trees for autosuggestion.

Replacing entity suggestion was pretty simple. I built a radix tree containing the names of all entities and then used prefix queries to get all entities matching a given prefix, ranked in descending order by population.

The more interesting problem was replacing questions. To do this, I first take the query and find all of the important words:

What is the population of seattle? => ['population', 'seattle']

I have two radix trees: one for entity names and one for variable names. I perform a prefix query on each tree for each word to get a list of variables and a list of entities related to the query. I ignore the results of the word if there are too many so that short words with many completions do not corrupt the results.

Then, I take the top n entities and the top n variables and find the combinations that we have data for. This is time most time consuming part of the process because it takes a SOQL query. Finally, I return each variable-entity combination which can be phrased as a question by the client.

Overall, this approach works very well. It has many advantages over time current system:

No need for an autosuggest index
Automatically updates when new data is added (even works for Michigan)
Includes sparse data
Faster query times (response times from question autosuggest dataset regularly > 1000ms)

The only real disadvantage is that it requires storing the entire entity radix tree in memory, which is about ~250MB. I'm going to create a review app to see how this will affect the server and make some tweaks if necessary.

socrata / odn-backend

Radix Trees to replace Entity and Question Autosuggest #65