naobservatory / mgs-pipeline

MIT License
4 stars 2 forks source link

Automatically add sample metadata when importing new bioprojects #45

Closed lennijusten closed 7 months ago

lennijusten commented 7 months ago

This PR adds a few features:

  1. Automatically make the /bioproject/{bioprojects}/metadata dir and populate the metadata.tsv file with sample-ids when fetching the actual sequencing data.
  2. Fetch the SRA metadata from NCBI using their E-utilities API and save in /bioproject/{bioprojects}/metadata/metadata-ncbi.tsv.

The "Run" column of the NCBI metadata should match the first column of the metadata.tsv but this may not always be the case (since the data is being downloaded from www.ebi.ac.uk?) hence we still use the metadata.tsv file.

Note - I would also like to add a column name row to metadata.csv to better document additional metadata we might add. I haven't done this yet is it will require modifying prepare-dashboard.py (and maybe other files?).

lennijusten commented 7 months ago

An example metadata-ncbi.tsv is attached here (in csv form) metadata-ncbi.csv

jeffkaufman commented 7 months ago

Can you move this out of draft so it will let me review it?