opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Change how BigQuery instance is created in `platform-output-support` #1754

Closed andrewhercules closed 2 years ago

andrewhercules commented 3 years ago

Currently, our BigQuery open-targets-prod instance has versioned tables for each release (e.g. platform_21_04).

However, for the Google Marketplace public dataset listing programme, our tables should be consistent from release to release and archives stored in GCP buckets.

Can we please update platform-output-support so that our open-targets-prod instance no longer has the data version?

mkarmona commented 3 years ago

@cmalangone translating this into a set of actions so you can activate this

  1. after each data release we create in BQ with the Open Targets versioned dataset as we are currently doing but in another project because the Google Marketplace policy needs
  2. clean the open-targets-prod BQ database and always recreate the same datasets without versioning in the name so it will always be one DB with the same datasets and same tables in it

Questions you might have?

cmalangone commented 3 years ago

@andrewhercules provides us the list of tables that must be listed https://github.com/opentargets/platform-app/blob/main/src/pages/DownloadsPage/dataset-mappings.json

andrewhercules commented 3 years ago

BigQuery updated, ticket closed

cmalangone commented 3 years ago

@andrewhercules I am finishing to add the code to POS :)

andrewhercules commented 3 years ago

No worries - I was too quick with closing! 😅

If possible, can you please also delete the platform_21_06, platform_21_09, and targval_quicksearches datasets?

I'm not sure how the Google public dataset pipeline works, but I figure it would be best if open-targets-prod only contains the one platform dataset.

That being said, I also understand if we use BigQuery to keep a copy of the two most recent releases — just please let me know so I can liaise with Google.

cmalangone commented 3 years ago

I deleted the old bigquery datasets. We will talk about it for shaping the POS output in a better way next release. I am changing the code of POS to generate just platform dataset with a ot_release table with the info about the release.

andrewhercules commented 2 years ago

A user has contacted the helpdesk about the knownDrugsAggregated data from our 21.11 release. The data is not available in Google BigQuery.

@cmalangone can you please check if POS is also adding that dataset to our open-targets-prod.platform BigQuery instance? If not, is this something we can manually do for this release?

cmalangone commented 2 years ago

I've created the dataset manually. The issue was that the file dataset-mappings.json has the dataset filtered as include_in_bq == false @andrewhercules can you please create a ticket?

andrewhercules commented 2 years ago

Thank you @cmalangone! 😄

I have updated #1790 and will adjust the mapping file for the 22.02 release.

andrewhercules commented 2 years ago

Ticket closed as datasets generated for BigQuery.

Additional work will be captured in #1790.