Closed dhimmel closed 1 month ago
Hi @dhimmel, this issue has been resolved. The changes will be released in the upcoming 24.09 platform release with the following drugType:
drugType |
---|
Antibody |
Antibody drug conjugate |
Cell |
Enzyme |
Gene |
Oligonucleotide |
Oligosaccharide |
Protein |
Small molecule |
Unknown |
@related-sciences appreciates all the great work by the OT team and noticed something small when upgrading to 24.06.
Running the following on BigQuery, which currently is based on the 24.06 release:
Produces the following table of drugType counts:
Notice the mixed casing for the "Unknown" / "unknown" value. This issue also exists in 24.03 although prior releases have been consistent in only using "Unknown".
Versioned GCS path
gs://open-targets-data-releases/24.06/output/etl/parquet/molecule
.There's the narrow fix and then possibly a broader fix of selecting possible values from an enum or applying a schema that would prevent an issue like this from ever occurring.