pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

Make expression files match submission format #1059

Open ValWood opened 1 year ago

ValWood commented 1 year ago

We have some expression files

https://www.pombase.org/data/releases/latest/misc/gene_expression_table.tsv

but the export format seems to include and internal ID (PBO) which should be exported as the correct relation to match the format described here https://www.pombase.org/documentation/quantitative-gene-expression-data-bulk-upload-format and here https://www.pombase.org/documentation/qualitative-gene-expression-data-bulk-upload-format

(note currently I think that only quantitative are exported)

@afg1

kimrutherford commented 8 months ago

The PBO terms in the file are just "RNA level" and "protein level" so we can probably just remove the "term_id" column.

ValWood commented 8 months ago

I wonder why we added it, since we have the "type" in column 3?

kimrutherford commented 8 months ago

I wonder why we added it, since we have the "type" in column 3?

Are we looking at the same file? I was looking at https://www.pombase.org/data/releases/latest/misc/gene_expression_table.tsv which doesn't have a "type" column.

ValWood commented 8 months ago

Sorry I'm confusing with the input file. I agree, we don't need the ID column since the type is expicit in "term name". SInce this coumn woud no longer be an "mini ontology" term (ony required for chado), it might be better to also relabel the column term_name to "type" so that it matches the coumn with the same data in the submission files? https://www.pombase.org/documentation/qualitative-gene-expression-data-bulk-upload-format although in the input file we call it "protein" not "protein level"

https://www.pombase.org/documentation/qualitative-gene-expression-data-bulk-upload-format

Since it's expression data, that the data will be "x level" can be assumed. So maybe we ony need to export type= protein note "term_name=protein level"

Also we call it "gene expression data", that shoud be more generically "expression data"

It's probably easier to chat about this...

kimrutherford commented 8 months ago

I've removed the unhelpful "term_id" column from the output file.

ValWood commented 5 months ago

The decision is make the output file(s) match the input file format which is described here:

https://www.pombase.org/documentation/qualitative-gene-expression-data-bulk-upload-format

https://www.pombase.org/documentation/quantitative-gene-expression-data-bulk-upload-format

kimrutherford commented 5 months ago

Note to self: change code that draws the violin plots and change configuration for columns in advanced search and table download