usc-isi-i2 / datamart-api

MIT License
1 stars 2 forks source link

Column names are not being preserved when they are annotated as qualifiers #28

Closed saggu closed 4 years ago

saggu commented 4 years ago

Archive.zip Steps to reproduce:

  1. create a dataset: TESEthMarket

  2. Import this KGTK Edge exploded file: kgtk-edges.tsv (from the attached files)

  3. Notice that one of the qualifier labels is Attack context which is same as the column name in attached zip file

  4. /datasets/TESEthMarket/variables/ingo returns data image

  5. Notice that Attack context is now Attack_context and Means of attack is Means_of_attack

  6. Run this url /datasets/TESEthMarket/variables?variable=ingo, this returns an error

    
    [2020-08-04 14:30:08,723] ERROR in app: Exception on /datasets/TESEthMarket/variables [GET]
    Traceback (most recent call last):
    File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4411, in get_value
    return libindex.get_value_at(s, key)
    File "pandas/_libs/index.pyx", line 44, in pandas._libs.index.get_value_at
    File "pandas/_libs/index.pyx", line 45, in pandas._libs.index.get_value_at
    File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at
    File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer
    TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functionsrule.endpoint File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/flask_restful/init.py", line 468, in wrapper resp = resource(*args, kwargs) File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/flask/views.py", line 89, in view return self.dispatch_request(*args, *kwargs) File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/flask_restful/init.py", line 583, in dispatch_request resp = meth(args, kwargs) File "/Users/amandeep/Github/datamart-api/api/variable/main.py", line 29, in get return g.get(dataset) File "/Users/amandeep/Github/datamart-api/api/variable/getall.py", line 56, in get = self.reshape_canonicaldata(, generic_qualifiers) File "/Users/amandeep/Github/datamart-api/api/variable/get_all.py", line 73, in reshape_canonical_data row['{}QUALIFIER{}'.format(row['variable_id'], q)] = row[q] File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/pandas/core/series.py", line 871, in getitem result = self.index.get_value(self, key) File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4419, in get_value raise e1 File "/Users/amandeep/Github/datamart-api/datamart_env/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4405, in get_value return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None)) File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value File "pandas/_libs/index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Attack context'


**This is happening because the code changes the qualifier names and replaces ' ' with '_'. Another issue is the variable metadata endpoint `/metadata/datasets/TESEthMarket/variables` returns original names like so ,**

[ { "name": "INGO", "variable_id": "ingo", "description": "INGO in TESEthMarket", "corresponds_to_property": "PVARIABLE-QTESEthMarket-003", "qualifier": [ { "name": "Location", "identifier": "PQUALIFIER-QTESEthMarket-006" }, { "name": "Attack context", "identifier": "PQUALIFIER-QTESEthMarket-005" }, { "name": "Means of attack", "identifier": "PQUALIFIER-QTESEthMarket-004" }, { "name": "City", "identifier": "PQUALIFIER-QTESEthMarket-002" }, { "name": "stated in", "identifier": "P248" }, { "name": "point in time", "identifier": "P585" } ] } ]



Which I use to know what were the qualifiers associated with this variable. Obviously the `name` from metadata does not match the column names returned.

This is the cause of the bug. Please work on this as priority as this is a blocker.
zmbq commented 4 years ago

There is actually no reason qualifier columns can't have spaces in them, this is indeed a bug. I will fix it so that the query returns column names as the original qualifier names (keeping the case, too, although I think Postgres is case insensitive, and the columns a and A are identical).

zmbq commented 4 years ago

Qualifiers now appear without _ in the query. Both variable queries work well without failing.

saggu commented 4 years ago

I will update this issue when I have the root cause of the new bug: label:= 'location in administratice....' for P131

saggu commented 4 years ago

Sorry, still no updates on this one.

saggu commented 4 years ago

The issue reported in this issue is fixed. I will open a new issue for any other problem. Closing this issue