mintproject / MINT-DataCatalog-Public

Public-facing aspects of data catalog, such as documentation, demos, tracking issues, and feature requests
Apache License 2.0
1 stars 1 forks source link

Keyword search #5

Open brandomr opened 5 years ago

brandomr commented 5 years ago

Overview

Users need a way to find datasets or variables by keyword search. This issue proposes implementing keyword and fuzzy search across variable names and metadata. Additionally, this issue proposes implementing fuzzy search across standard variables.

Current State

Currently, DCAT supports keyword search by standard variable name. For example, a search against standard variables can be executed with the following:

q = {
    "standard_variable_names__in": ["ISO-3 Country Code"]
}

resp = requests.post(f"{url}/datasets/find", 
                                        headers=request_headers,
                                        json=q).json()
if resp['result'] == 'success':
    found_resources = resp['resources']
    print(f"Found {len(found_resources)} resources")
    print(found_resources)

This query yields results which were an exact match on the query string ISO-3 Country Code. However, changing the query string from ISO-3 Country Code to ISO3 Country Code yields zero results.

Proposed Future State

Ideally, users should be able to execute case-insensitive search against not just standard variables but also names of variables and description information which may reside in a variable or datasets' metadata.

Additionally, users should be able to perform fuzzy search against both standard variables and the name/description fields described above. This could be executed using a standard such as Lucene syntax or simply allowing a wildcard character (*).