hubmapconsortium / metadata-consistency

1 stars 0 forks source link

Datasets missing `assay_type` and `assay_category` #3

Open icaoberg opened 1 year ago

icaoberg commented 1 year ago

The following categories are missing values in the assay_type and assay_category fields.

Comment these fields are missing on the datasets missing the metadata field as well. So may be these are meant to be populated during ingestion.

+------+-----------------+-----------+----------------+--------------+-------------+--------------+------------------+
|      | hubmap_id       | status    | is_protected   | is_primary   | data_type   |   assay_type |   assay_category |
+======+=================+===========+================+==============+=============+==============+==================+
|  645 | HBM347.RFGL.437 | Published | True           | True         | SNAREseq    |          nan |              nan |
+------+-----------------+-----------+----------------+--------------+-------------+--------------+------------------+
| 1436 | HBM773.WCXC.264 | Published | True           | True         | snRNAseq    |          nan |              nan |
+------+-----------------+-----------+----------------+--------------+-------------+--------------+------------------+
j-uranic commented 1 year ago

@icaoberg It looks like this would also be impacted by the data ingest not currently running (if it needs to be added during ingest). If there is a way to manually update this that anyone can share with me, I can look into it.

Are these the only datasets you've found where this error is occurring? I am wondering if it is a one-off problem that might have its own unique cause, or if there is something that needs to added to the ingest to enforce that these fields are not empty?

icaoberg commented 1 year ago

@icaoberg agree. These datasets need to be ingested again. I don't know the procedure but this process needs to be driven by @sunset666 or @bhonick. However this re-ingestion/processing needs to happen ASAP.