ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
355 stars 39 forks source link

BioSample record is missing an attribute value #406

Closed muffato closed 1 day ago

muffato commented 1 day ago

Before opening an issue, please:

Describe the bug

I'm querying assemblies and getting BioSample information from the datasets output. I've found one assembly for which the BioSample section seems to be missing an attribute value.

To Reproduce

$ datasets summary genome accession GCA_018245035.1 --as-json-lines | jq '.assembly_info.biosample.attributes'
[
  {
    "name": "isolate",
    "value": "not applicable"
  },
  {
    "name": "breed",
    "value": "not applicable"
  },
  {
    "name": "host",
    "value": "not applicable"
  },
  {
    "name": "isolation_source",
    "value": "not applicable"
  },
  {
    "name": "collection_date"
  },
  {
    "name": "geo_loc_name",
    "value": "Cameroon: Barombi Station"
  },
  {
    "name": "tissue",
    "value": "missing"
  },
  {
    "name": "collected_by",
    "value": "Paul Preuss"
  },
  {
    "name": "dev_stage",
    "value": "adult"
  },
  {
    "name": "sex",
    "value": "male"
  },
  {
    "name": "Type",
    "value": "ST"
  }
]

You'll see that collection_date doesn't have a value under it, unlike all the other attributes, and all the other assemblies I've queried so far.

Best regards, Matthieu

olearyna commented 1 day ago

Hi muffato,

Thanks for opening this issue.

It appears that a collection date was not submitted for this particular BioSample. The submission was made before this field became a required part of the submission process, which is why the data is missing.

Please note that all required fields are displayed by default in the current datasets output.

Let me know if you have any other questions.

Nuala

muffato commented 1 day ago

I see. Thank you !