Closed andrewhercules closed 3 years ago
@mirandaio we have updated the list of datasets and the dataset mapping file - the structure remains the same but we have updated some of the fields (e.g. descriptions). Please use the files below:
Some decisions on the meeting today:
http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/21.02/output/...
.1 wget command - This will contain the command wget -r -np -nH --cut-dirs 7 ftp://ftp.ebi.ac.uk/pub/databases/opentargets/platform/21.02/output/ETL_parquet/diseases
.1 Google Cloud Platform (paywalled) - This will contain the command gsutil -m cp -r gs://open-targets-data-releases/21.04/output/etl-parquet/diseases .
Also, please update the URL of the downloads page to /downloads
Looks great @mirandaio! 👍 Can we please make a few changes and then open a PR?
For sample scripts to download and parse datasets using Python or R, please visit our Data Downloads documentation
[x] Standardise the formatting of the chips to JSON
and Parquet
[x] Within the drawer, update the options available and the URL based on a common URL pattern of baseURL
+ /
+ dataVersion
+ /etl/output/
+ datasetFormat
+ /
+ datasetName
Option | baseURL |
---|---|
rsync | rsync -rpltvz --delete rsync.ebi.ac.uk::pub/databases/opentargets/platform/ |
wget | wget --recursive --no-parent --no-host-directories --cut-dirs 7 \ ftp://ftp.ebi.ac.uk/pub/databases/opentargets/platform/ |
FTP | ftp.ebi.ac.uk/pub/databases/opentargets/platform/ |
Google Cloud (paywalled) | gsutil -m cp -r gs://open-targets-data-releases/ |
The datasetFormat
value will be either json
or parquet
depending on which chip is selected.
For rsync, wget, and Google Cloud, please add a space and full stop at the end of the URL .
For example, the rsync command to access 21.04 disease data in Parquet would be rsync -rpltvz --delete rsync.ebi.ac.uk::pub/databases/opentargets/platform/21.04/etl/output/parquet/diseases .
this looks great. I would reorder the last table to do:
As part of #1411, we will implement a new data downloads page that allows users to download a larger list of files. The implementation is based on each data file being included in a JSONlines file that will be produced by the ETL pipelines.
Can we please implement a new data downloads page based on the following specification (v2.4)?
User visits /downloads/data page
At the top of the page, please display the following text:
Please link "Licence documentation" to
https://platform-docs.opentargets.org/licence
and please link "FTP" tohttp://ftp.ebi.ac.uk/pub/databases/opentargets/platform/
.The data version is available by calling the GraphQL
meta
endpoint and requestingdataVersion.year
anddataVersion.month
(e.g. sample query returning 21.02).The list of datasets are available in a JSONlines file based on the output of the ETL pipeline -
list-of-datasets.json
. The dataset labels, description, show/hide status, and order are available in a JSON file -dataset-mapping-file.json
. Both files share the sameid
value so that using theid
from thelist-of-datasets.json
file will correspond with an entry in thedataset-mapping-file.json
file.json-files-for-data-downloads-page.zip
Please integrate both JSON files and show in a data table (include search and show more rows functionality).
Within the
dataset-mapping-file.json
file, please use theinclude_in_fe
boolean to determine if the file should be shown in the data table and use theorder
to determine the order the files should be shown.nice_name
fromdataset-mapping-file.json
description
fromdataset-mapping-file.json
resource.format
fromlist-of-datasets.json
User clicks on any of the chips in the "Format(s)" column
Please use the drawer component to open up a view with tabs for each format. Within each tab, please provide the relevant URLs for FTP and Google Cloud Access and the Font-Awesome copy icon that copies the URL to the user's clipboard.
FTP
baseUrl:
ftp.ebi.ac.uk/pub/databases/opentargets/platform/
dataVersion.year: use GraphQL APImeta
endpoint to retrievedataVersion.year
dataVersion.month: use GraphQL APImeta
endpoint to retrievedataVersion.month
filePath: useresource.path
valueThe format of the URL should be
baseUrl
+dataVersion.year
+.
+dataVersion.month
+/output/ETL/
+filePath
For example,
ftp.ebi.ac.uk/pub/databases/opentargets/platform/21.04/output/ETL/associationByOverallDirect
Google Cloud
baseUrl:
gs://open-targets-data-releases/
dataVersion.year: use GraphQL APImeta
endpoint to retrievedataVersion.year
dataVersion.month: use GraphQL APImeta
endpoint to retrievedataVersion.month
filePath: useresource.path
valueThe format of the URL should be
gsutil ls
+baseUrl
+dataVersion.year
+.
+dataVersion.month
+/output/ETL/
+filePath
For example,
gsutil ls gs://open-targets-data-releases/21.04/output/ETL/associationByOverallDirect