legumeinfo / datastore-issues

mostly for issues pertaining to the content of the legumeinfo datastore; may also relate to characteristics of its user interface or managing the mirroring process to the legfed instance
Other
1 stars 0 forks source link

expression #207

Open adf-ncgr opened 1 month ago

adf-ncgr commented 1 month ago

@jd-campbell had mentioned an interest in this project: PRJNA996630

a preprint is here: https://www.biorxiv.org/content/10.1101/2023.08.15.553447v1

It looks like it's maybe just two tissues (leaves and flowers) but a complex set of stresses being applied in multi-factorial combinations. Looks like some of these stresses may be captured in one of the other ontologies that we haven't used much (if any) yet, e.g.: https://browser.planteome.org/amigo/term/PECO:0001046#display-lineage-tab

@el239 if you could bump this up as a priority dataset we could add it to the set that we're working on getting fully released. thanks!

el239 commented 1 month ago

@adf-ncgr I've completed the prep for this project's pipeline results and staged it for the datastore. However, as it's a pre-print, the citation info isn't fully fleshed out. Also, the perl script isn't pulling the plainly labeled "treatment" field for some reason. But I could create the obo file based on the sample names/descriptions. These are the terms I used for the conditions:

PECO:0001046 limited phosphate exposure PECO:0007048 sodium chloride exposure PECO:0007134 acidic pH growth media environment exposure PECO:0007404 drought environment exposure PECO:0007044 mineral exposure PO:0025034 leaf PO:0009046 flower

Controls only have the plant ontologies, whereas others additionally have the appropriate PECO terms separated by commas. Will close for now, please let me know if it needs any edits.

adf-ncgr commented 1 month ago

Great, thanks and just FYI I've fixed the bug with that perl script not capturing "treatment" sorry about that!

el239 commented 1 month ago

Oh, super! I've supplanted the old version of the sample.tsv with one containing those data.

adf-ncgr commented 1 month ago

Can you double check the result, I am seeing some things in /falafel/legumeinfo/data/v2/Glycine/max/expression/Wm82.gnm6.ann1.expr.Magellan.Pelaez-Vico_Sinha_2023/glyma.Wm82.gnm6.ann1.expr.Magellan.Pelaez-Vico_Sinha_2023.samples.tsv.gz that look off, but I think it represents the updated version? For example SRX21100215 Low phosphate leaves replicate 3 is shown as having treatment Salinity+acidity+water deficit but when I run the updated script it produces an output that has treatment as Low phosphate (phosphate concentration was reduced by 90%) which seems more sensible. Not sure where the disconnect is.

el239 commented 1 month ago

Oh, my mistake. I had just slotted that column from a new perl script run into the old samples table, not realizing that the samples table generates with a different row order each time. I should have checked it. I have corrected that now and sorted the file by identifiers now.