Closed corneliusroemer closed 7 months ago
Hi corneliusroemer,
Thank you for your suggestions. We are currently reviewing your metadata requests in collaboration with the NCBI Virus team. We will resolve any issues on our end. However, some metadata requests might require coordination with the NCBI Virus team. I will update you once we start working on this.
All the best,
Nuala
Nuala A. O'Leary, PhD Product Owner, NCBI Datasets National Center for Biotechnology Information, NLM, NIH, DHHS
Hi corneliusroemer,
I discussed your request with the NCBI Virus group. There are no current plans to pull data from the /note section of the GenBank record but they will look into it. Any updates they make will be picked up by NCBI Datasets. You can contact the NCBI Virus group through the general NCBI feedback form https://support.nlm.nih.gov/support/create-case/.
Thanks, Nuala
Any news on the integration of the /mol_type --> "molType"
? Or are there other ways to infer these from taxonomy data? I'd hate to be forced to download Genbank format as well in the future...
Hi dandaman,
We don't have moltype in the virus report yet but you can get it from the taxonomy data report for any tax id.
Here is the command using dataformat to get the taxid from the virus report
datasets summary virus genome accession U28077.1 --as-json-lines | dataformat tsv virus-genome --fields virus-tax-id --elide-header
186538
Here is the command to get the moltype from the taxonomy report using jq
datasets summary taxonomy taxon 186538 | jq -r .reports[].taxonomy.genomic_moltype
ssRNA(-)
Let me know if you have any questions.
Nuala
Dear @olearyna,
that is perfect, thank you :-)
Best, Daniel
Quite frequently, valuable metadata is contained in the genbank file field '/note`.
Unfortunately, this field seems to get lost on the way to 'datasets download virus genome'
Consider the metadata available for the genbank file under SOURCE:
with what ends up in
datasets download virus genome taxon
:Valuable information is lost:
/note="subtype: Zaire"
/strain="Mayinga 1976"
/mol_type="genomic RNA"
This is probably not even such a good example, I can think of more important notes but couldn't find an example just now.
It would be nice, if all this metadata was passed through.
In fact, it might be a bug that
molType
is missing, as that is a field that should already be output per the schema here: https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/data-reports/virus/