usc-isi-i2 / datamart-api

MIT License
1 stars 2 forks source link

Add CAMEO codes for countries as aliases #82

Open saggu opened 4 years ago

saggu commented 4 years ago

https://github.com/carrillo/Gdelt/blob/master/resources/staticTables/CAMEO.country.txt

  1. Search variable metadata by CAMEO code
  2. Search variable data by CAMEO code
  3. Return CAMEO code in the get data API
  4. Optional: if the users want to insert data by CAMEO code, then we have to update the Wikifier.
kyao commented 4 years ago
kyao commented 4 years ago

Get data with cameo column, and filter by cameo

GET /datasets/AMIS/variables/total_supply?country=BEL&include=cameo
saggu commented 4 years ago

I have created two files,

  1. country_cameo_exploded.tsv : cameo codes for countries as alias and cameo code (property: P2010270001)
  2. new_property_cameo_exploded.tsv: label for property P2010270001

Import these files:

python script/import_tsv_postgres.py country_cameo_exploded.tsv
python script/import_tsv_postgres.py new_property_cameo_exploded.tsv

Refresh views

python script/refresh_search_views.py

I have tested that search variable metadata and get data with these cameo codes and it is working fine.

/metadata/variables?country=ETH

/datasets/WDI/variables/access_to_electricity_of_population?country=GAB

cameo_code_and_property.zip

saggu commented 4 years ago

Added these files to our list of essential files https://docs.google.com/spreadsheets/d/17V3QMUGdm3TW4vQKcM-Zhm9tNUJVLSX02D1SmF_pF3E/edit#gid=0

saggu commented 4 years ago

@zmbq changes required in queries to fetch CAMEO codes for countries, the property is P2010270001

zmbq commented 4 years ago

Many of the cameo codes do not correspond to regions. WSB and ABW, for instance, can't be queried, since their qnode is not associated with any region (they don't have any P31 edge really) - so the API doesn't find them.

zmbq commented 4 years ago

Implemented in the itay/cameo branch.

saggu commented 4 years ago

I added the cameo for regions just for completion. I will test the code in the new branch

saggu commented 4 years ago

New update from Dan:

They want to search by and receive the full IRI for CAMEO codes.

/metadata/variables?country=http://ontology.causeex.com/cameo/CountryCodeOntology/CAMEOeth

/datasets/WDI/variables/access_to_electricity_of_population?country=http://ontology.causeex.com/cameo/CountryCodeOntology/CAMEOgab

In the data returned, they want the IRIs as well

saggu commented 4 years ago

@zmbq As per the new requirements, I inserted alias and cameo code as IRIs. I am trying to search variable metadata with IRI, not working.

/metadata/variables?country=http%3A%2F%2Fontology.causeex.com%2Fcameo%2FCountryCodeOntology%2FCAMEOeth

also tried

/metadata/variables?country=http://ontology.causeex.com/cameo/CountryCodeOntology/CAMEOeth

It is in the database

image

Updated file with IRIs cameo_code_and_property.zip

saggu commented 4 years ago

I tested @zmbq changes in the itay/cameo branch. I have to set include=country_cameo for this to work, which is fine. image

However, country ids are not included, we want country ids by default.

zmbq commented 4 years ago

Fixed in itay/cameo .

saggu commented 4 years ago

As per the new requirements, I inserted alias and cameo code as IRIs. I am trying to search variable metadata with IRI, not working.

/metadata/variables?country=http%3A%2F%2Fontology.causeex.com%2Fcameo%2FCountryCodeOntology%2FCAMEOeth

also tried

/metadata/variables?country=http://ontology.causeex.com/cameo/CountryCodeOntology/CAMEOeth

It is in the database

image

Updated file with IRIs cameo_code_and_property.zip

@zmbq is this also fixed?

saggu commented 4 years ago

database base containing CAMEO uris https://drive.google.com/file/d/1buKO4UghTGOY_gVnPeW8rW-t4yKnSWiq/view?usp=sharing

zmbq commented 4 years ago

This is due to the input sanitation. We do not allow non-alphanumeric characters, so the slashes and sharp signs are removed.

I have fixed the sanitation to also accepts / and #, but this query still fails, since the new cameo codes are not considered aliases. You should add each cameo as two edges - a cameo and an alias edge. Also, note that cameo codes have # in them, so the query above should be for http%3A%2F%2Fontology.causeex.com%2Fcameo%2FCountryCodeOntology%2FCAMEO#eth

I saw you already fixed the colons in tag names.

I have pushed the sanitation fix to the development branch.

saggu commented 4 years ago

You should add each cameo as two edges - a cameo and an alias edge

I did, it was rejected silently by the ON CONFLICT DO NOTHING. Oh well, I recreated the database backup with CAMEO URIs as aliases as well.

image

Uploaded the database backup to https://drive.google.com/drive/u/1/folders/0AGFpIvVliZecUk9PVA as well

saggu commented 4 years ago

Still can't seem to search with the URIs though, will try in a bit again. In a hurry right now

zmbq commented 4 years ago

If it still doesn't work, send me the new database dump and I'll try it.

saggu commented 4 years ago

@zmbq its here https://drive.google.com/drive/u/1/folders/0AGFpIvVliZecUk9PVA

zmbq commented 4 years ago

. had to be sanitized as well. Pushed into development.

I'm not crazy about the sanitation relaxations I've added today. I'm not sure whether they introduce a potential SQL injection vulnerability or not. I don't know all the SQL Injection tricks. We still don't allow ' and - characters, which are by far the most common in attacks, but still.

saggu commented 4 years ago

@zmbq I am sorry but this is still not working. Did you test it? what would be magic incantation to make it work?

This is what I am trying

/metadata/variables?country=http://ontology.causeex.com/cameo/CountryCodeOntology/CAMEO#blk

and also

/metadata/variables?country=http%3A%2F%2Fontology.causeex.com%2Fcameo%2FCountryCodeOntology%2FCAMEO%23blk

Nothing works, this is after I refresh the search views.

saggu commented 4 years ago

This works if search on alias

/metadata/variables?alias=http://ontology.causeex.com/cameo/CountryCodeOntology/CAMEO#blk

It should work when searching on country. Am I misremembering something?

saggu commented 4 years ago

Causx doesnt want the hash in the URI