cmalangone commented 4 years ago

The ticket platform/issues/657 explains how to create the dumps for the list of disease and targets.

The command has to be integrated in the platform-infrastructure script. (run.sh)

cmalangone commented 4 years ago

657

cmalangone commented 4 years ago

The new list of commands are echo '"ensembl_id","hgnc_approved_symbol","uniprot_accessions","number_of_associations"' > 20.02_target_list.csv cat 20.02.1-search-data.json | jq -r 'select(.type=="target") | [.id, .approved_symbol, [.uniprot_accessions | join("|")][], .association_counts.total] | @csv' >> 20.02_target_list.csv cat 20.02.1-search-data.json | jq -r 'select(.type=="target") | {"ensembl_id": .id, "hgnc_approved_symbol": .approved_symbol, "uniprot_accessions": .uniprot_accessions, "number_of_associations": .association_counts.total}' > 20.02_target_list.json

echo '"efo_id","disease_full_name","number_of_associations"' > 20.02_disease_list.csv cat 20.02.1-search-data.json | jq -r 'select(.type=="disease") | [.id, .full_name, .association_counts.total] | @csv' >> 20.02_disease_list.csv cat 20.02.1-search-data.json | jq -c 'select(.type=="disease") | {"efo_id": .id, "disease_full_name": .full_name, "number_of_associations": .association_counts.total}' > 20.02_disease_list.json

cmalangone commented 4 years ago

echo '"ensembl_id","hgnc_approved_symbol","uniprot_accessions","number_of_associations"' > 20.09_target_list.csv cat 20.09_search-data.json | jq -r 'select(.type=="target") | [.id, .approved_symbol, [.uniprot_accessions | join("|")][], .association_counts.total] | @csv' >> 20.09_target_list.csv cat 20.09_search-data.json | jq -r 'select(.type=="target") | {"ensembl_id": .id, "hgnc_approved_symbol": .approved_symbol, "uniprot_accessions": .uniprot_accessions, "number_of_associations": .association_counts.total}' > 20.09_target_list.json

echo '"efo_id","disease_full_name","number_of_associations"' > 20.09_disease_list.csv cat 20.09_search-data.json | jq -r 'select(.type=="disease") | [.id, .full_name, .association_counts.total] | @csv' >> 20.09_disease_list.csv cat 20.09_search-data.json | jq -c 'select(.type=="disease") | {"efo_id": .id, "disease_full_name": .full_name, "number_of_associations": .association_counts.total}' > 20.09_disease_list.json

Gzip the output files. Copy to the proper GS Change the header of the files in the google storage

cmalangone commented 3 years ago

echo '"ensembl_id","hgnc_approved_symbol","uniprot_accessions","number_of_associations"' > 20.11_target_list.csv cat 20.11_search-data.json | jq -r 'select(.type=="target") | [.id, .approved_symbol, [.uniprot_accessions | join("|")][], .association_counts.total] | @csv' >> 20.11_target_list.csv cat 20.11_search-data.json | jq -r 'select(.type=="target") | {"ensembl_id": .id, "hgnc_approved_symbol": .approved_symbol, "uniprot_accessions": .uniprot_accessions, "number_of_associations": .association_counts.total}' > 20.11_target_list.json

echo '"efo_id","disease_full_name","number_of_associations"' > 20.11_disease_list.csv cat 20.11_search-data.json | jq -r 'select(.type=="disease") | [.id, .full_name, .association_counts.total] | @csv' >> 20.11_disease_list.csv cat 20.11_search-data.json | jq -c 'select(.type=="disease") | {"efo_id": .id, "disease_full_name": .full_name, "number_of_associations": .association_counts.total}' > 20.11_disease_list.json

Gzip and change the header Eg, gsutil setmeta -h "Content-Type:application/x-gzip" gs://open-targets-data-releases/20.11/output/20.11_target_list.json.gz

cmalangone commented 3 years ago

Please keep this ticket opened. The new pipeline has to manage the creation of these files.

andrewhercules commented 3 years ago

Tagged for 21.02 release

cmalangone commented 3 years ago

21.02 will still generate these files with this manual process.

d0choa commented 3 years ago

This functionality won't be necessary in the new pipeline, as we will make all ETL outputs accessible. We will do it manually one more time for 21.02.

Closing this issue as no action is expected on "automatic creations" for the data_pipeline/Angular

cmalangone commented 3 years ago

The dumps for 21.02 are available here: https://storage.googleapis.com/open-targets-data-releases/21.02/output/21.02_target_list.json.gz https://storage.googleapis.com/open-targets-data-releases/21.02/output/21.02_target_list.csv.gz https://storage.googleapis.com/open-targets-data-releases/21.02/output/21.02_disease_list.json.gz https://storage.googleapis.com/open-targets-data-releases/21.02/output/21.02_disease_list.csv.gz

ktsirigos commented 3 years ago

No relevant for rewrite.

opentargets / issues

Platform-infrastructure: Automatic creations of the list of target and disease dumps #774

657