turbomam / biosample-xmldb-sqldb

Tools for loading NCBI Biosample into an XML database and then transforming that into a SQL database
MIT License
0 stars 1 forks source link

database competency questions #41

Open turbomam opened 5 months ago

turbomam commented 5 months ago

What claims would we like to make about the Biosample metadata presented in the NMDC MongoDB compared to NCBI?

Acknowledge that our Biosample Postgres database may be lossy esp wrt non-attribute metadata. There are some paths that aren't ingested at all and others that result in concatenations (which are harder to search over efficiently)

Break compound questions like "how many soil metagenome samples have a valid latitude and longitude" into prerequisites like

aclum commented 5 months ago

jotting down some of the meeting notes Distribution of samples, either all or ones that we think are metagenomes, by template type used. example follow up queries of samples that use a generic template what percentage of those attributes have a harmonized value present. When checking for specific info make sure the mapping makes sense (ie lat log isn't 'harmonized' b/c GSC stores this as a single value and other institutions store this in two separate values).