socrata / odn-backend

Backend for the Open Data Network.
Other
9 stars 6 forks source link

question autosuggest improvements #68

Open zang0 opened 8 years ago

zang0 commented 8 years ago

_1 guarantee apriori allocated memory is not violated w/ logging of dropped questions _2 king county wa > should not match all the NY results _3 curious terms that yield no results

male, female, median, earnings, ...

are these vars never added for some reason? or stop worded?

_4 98117

note: only 3 questions, looks like lots are missing, see the grad rates, earnings permutations, etc.

aaasen commented 8 years ago

Fixed these issues, demo here: https://opendatanetwork-staging-pr-599.herokuapp.com/

zang0 commented 8 years ago

re: _2 -> curious why "king county wa" is matching "king county va" and "king county tx"

_5 - jobs > matches on population count, term mapping problem? _5.1 - multinoma > same thing

_6 - obesity > matches on only california obesity rates, should get a bunch of matches

_7 - default georegion matches for things like "graduation rates" - i think we should go w/ states/metros/counties/cities as results and not include us, divisions as they're typically odd and the maps are all buggered up on landing

_8 - queries like kalamazoo, questions should be diversified across variables like in prod

_9 - 'crime' returns no questions, but 'crime seattle' does

zang0 commented 8 years ago

I added a super simple first pass script to generate a basic ontology w/ synonyms and misspellings for cities. The current file is in: https://github.com/socrata/odn-backend/blob/staging/scripts/ontology/place-synonyms.json in staging. Have a look and see if you can incorporate it into the autosuggest. I can improve the ontology over time once its wired in.

zang0 commented 8 years ago

Agreed to eliminate the clear misses, e.g. jobs > population returns and then have a look.

aaasen commented 8 years ago

One problem with this is that we no longer return questions for a single entity e.g. "seattle".

Also, for the synonyms, check https://github.com/socrata/odn-backend/blob/staging/data/aliases.json