microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

Fix projects where depth unit is not specified #612

Open aclum opened 5 months ago

aclum commented 5 months ago

These are the counts from a mongo query.

@JamesTessmer @mslarae13 can you fix 1000 soils (nmdc:sty-11-28tm5d36). I believe these should all be in meters. For gold:Gs0135149 these are EMSL only samples @mslarae13 can you check these out.

nmdc:sty-11-hdd4bf83, this is TRiP. We can update the unit here to meters b/c has_numeric_value for all of these is 0.

We should think about if we want to store and display values of 0. It doesn't make much sense to store depth for animal host-associated samples.

_id
Array (1)
0
"nmdc:sty-11-28tm5d36" 
count
134
_id
Array (1)
0
"gold:Gs0135149"
count
5
_id
Array (1)
0
"nmdc:sty-11-hdd4bf83"
count
61
aclum commented 5 months ago

related to https://github.com/microbiomedata/nmdc-server/issues/756

aclum commented 5 months ago

Brodie samples (gold:Gs0135149) name,depth.has_numeric_value,id "Soil microbial communities from the East River watershed near Crested Butte, Colorado, United States - ER_145",5,igsn:IEWFS000I "Soil microbial communities from the East River watershed near Crested Butte, Colorado, United States - ER_147",15,igsn:IEWFS000K "Soil microbial communities from the East River watershed near Crested Butte, Colorado, United States - ER_135",15,igsn:IEWFS000B "Soil microbial communities from the East River watershed near Crested Butte, Colorado, United States - ER_134",5,igsn:IEWFS000A "Soil microbial communities from the East River watershed near Crested Butte, Colorado, United States - ER_146",5,igsn:IEWFS000J

mslarae13 commented 1 week ago

Was the decision to REQUIRE depth in meters??? So, we should just show Depth, meters in the UI? Or do we need to put a unit on the value in the database?

@aclum @turbomam

To add, we decided in UCUM, which we haven't implemented... but that's m right?

See below.. is it m or meter? cs_code or name? Did we make that decision? ![Uploading Screenshot 2024-07-03 at 11.18.51 AM.png…]()

mslarae13 commented 1 week ago

Odd....

https://data.microbiomedata.org/details/sample/nmdc:bsm-11-yqhjes36 has " , meters" in depth

But

https://data.microbiomedata.org/details/sample/nmdc:bsm-11-bsf8yq62 doesn't have a unit...

How did that happen!?

GOLD vs not?

mslarae13 commented 1 week ago

I assume change sheet is the best way to fix this?

mslarae13 commented 6 days ago

Decision, the metadata should be complete. How the UI displays it does not limit what we store. Need to add 'meters' to the resultsdepthhas_unit slot for these samples.

@bmeluch could you help make a change sheet? We can chat about it Tuesday

aclum commented 2 days ago

Any new changes should use 'm' to be more consistent with UCUM.

turbomam commented 2 days ago

Any new changes should use 'm' to be more consistent with UCUM.

I agree on consistency and don't have any objection to UCUM's m.

We can a report of the current Biosample.depths with something like this:

wget -O biosample_depths.json \
    "https://api.microbiomedata.org/nmdcschema/biosample_set?max_page_size=9999&projection=depth"
jq \
    -r '.resources[] | [.id, .depth.has_raw_value, .depth.has_numeric_value, .depth.has_unit] | @tsv' \
    biosample_depths.json > biosample_depths.tsv
cut -d $'\t' -f4 biosample_depths.tsv | sort | uniq -c
   2347 
   4737 m
    357 meter
    681 meters
     60 metre
turbomam commented 2 days ago

I don't have any trick for doing something like that for the submissions in the submission portal