Open saggu opened 4 years ago
So we want the fuzzy search results to be further screened by country - only variables that have data for that country should be displayed? (or actually, if we filter, we can filter by any admin level)
We will try this out at search time first to see how fast the API is
Implemented in the itay/fuzzy-search-admins
. Add ?country=
or ?country_id=
arguments to the fuzzy search endpoint. You can add multiple countries, and they will be ORed. A variable is returned from the fuzzy search if it has at least one datapoint for one of the countries in the arguments.
Since this is implemented with a materielized view, you need to create it before you can test. Run python script/create_search_views.py
. If you want to test the search after uploading new data, you will need to run python script/refresh_search_views.py
. Both can take around 10 minutes to complete.
Only country level filtering is supported at this point, you cannot filter on admin1, admin2 or admin3.
I tested this,
/metadata/variables?keyword=un&country=Ethiopia
works and it is very fast.
However
/metadata/variables?country=Ethiopia
does not work.
I get this error,
{"Error": "A variable query must be provided: keyword"}
They are going to want to search by only country.
Also this query,
/metadata/variables?keyword=crop&country=Ethiopia&country=Gambia
throws this error
{"Error": [["No country Gambia"]]}`
Instead of returning that, it should return variables which have Ethiopia in them.
If there are no matching variables, it should return an empty table and not an error
The fuzzy search is limited to 10 responses - querying variables that have a specific country in their data, without any filtering on the name is going to return a lot more than that. I can make the limit larger, but I'm not sure it will be helpful. There are 1,300 variables with the US, for example. and hundreds with pretty much every other country.
As for returning an error when specifying a non-existing country - this is the same behavior we have in the get variable data endpoint. We wanted to distinguish the case where there is no data (or variables in our case) for that country, from the case where you used a non-existent country. I think the behavior should be consistent throughout the system. What do you think?
For non-existing country I think returning an error is reasonable.
The Wikidata name for Gambia is The Gambia
. This query works as expected: /metadata/variables?keyword=crop&country=Ethiopia&country=The Gambia
@saggu I just pushed changes that allow keywords to be missing. And, an optional result limit
is added. The default is 100.
https://github.com/usc-isi-i2/datamart-api/commit/8e48efea9041a53434b2f4f0f618f35176e7fdce
@kyao Tested and works fine. I have deployed it for WM.
Now admin1
, admin2
and admin3
based search is remaining
I'll send a kgtk exploded file with admin3 to test this
@saggu , can you please add the file with admin3?
Waiting for tomorrow's meeting (Sep 16) to see if we even need this functionality. Will update. Moving to ToDo
Here is a dataset with admin3. Both tsv files in the zip file are needed.
Pushed into development. You need to rerun python script/create_search_views.py to create the admin fuzzy search views. If you upload new data, you will need to run
python script/refresh_search_views.py`
Ok, here are the steps I followed,
/metadata/variables?country=Ethiopia
works (search by country)/metadata/variables?admin=oromia
also works, however
/metadata/variables?admin1=oromia
/metadata/variables?admin2=oromia
/metadata/variables?admin3=oromia
does not workAre we supposed to search like this @zmbq (using admin
and not admin1
, admin2
or admin3
.
Also, if I do not refresh the views, the API throws an error. We should either error handle that or refresh views automatically every time new data is ingested. Suggestions @szeke @kyao
You need to import dataset-edges.tsv
, too. Also, the admin's name is called oromia region
I think.
As for having to create the views - this is until we're finished with this issue, then I'll create a database backup with the views and the data.
@zmbq my question is Are we supposed to search with admin=<some admin>
? or using admin1
, admin2
or admin3
?
This is according to this chat
Currently we do not have this information in the variable metadata. But this feature could become important