microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Add RNA library permissible values to `LibraryTypeEnum` for `LibraryPreparation`'s `library_type` slot #1000

Closed aclum closed 1 month ago

aclum commented 1 year ago

Not urgent since it is not needed for NEON but we need a slot to capture the type of RNA library since this is needed input for Class MetatranscriptomeActivity

Proposed slot rna_seq_type

Propose enum for range with enumerations of "Stranded RNA-seq, R1 is forward, R2 is reversed", "Stranded RNA-seq, R2 is forward, R1 is reversed", "non-stranded RNA-seq"

turbomam commented 1 year ago

OK. The permissible values for the enumeration should be very succinct, preferably a single word or something_snake_cased.

Class PermissibleValue takes most of the annotations as other LinkML meta classes

If we can find OBO foundry terms for those concepts, we should assign the term CURIEs to the meaning slot.

mslarae13 commented 5 months ago

Low priority. Mark can add this enum. Will need review. If not completed next sprint will not be included in berk roll out.

turbomam commented 5 months ago

@aclum I just rewrote your title like a cowboy, but then thought oops maybe I don't really know what you want.

Can we just add permissible values to the existing LibraryTypeEnum, or do you really feel like a new slot is required? I see that you are suggesting a new slot called rna_seq_type. Does that mean that you would want to have something like this?

id: nmdc:libprp-99-abc123
type: nmdc:LibraryPreparation
library_type: RNA 
rna_seq_type: stranded_rnaseq_r1f_r2r

There aren't any library_type values other than 'DNA' in the metadata I put into GraphDB a week or two ago, so eliminating the 'RNA' value wouldn't require a migration.

Is there any circumstance you would want to say that the library_type is 'RNA' without asserting a rna_seq_type?

aclum commented 5 months ago

I would prefer this as a separate slot since we may not have this information or we'll have to infer it from a combination of institution+processing date.

aclum commented 3 months ago

@turbomam do you have time work on this ticket this sprint? @kaijli and I are reviewing the workflow and will need this information as a workflow parameter for the counting step. We'll need to reprocess existing projects after re-iding b/c the new workflow uses a different assembler.

mslarae13 commented 2 months ago

re-iding needs to re-process metaT data. File type enum for metaT isn't sorted out. Adding support for metaT allowed in soft freeze. TBD, specific to 1 data type, so impacts shouldn't be huge.

aclum commented 1 month ago

@turbomam do you have time to work on this next sprint?

aclum commented 1 month ago

Options for permissible values if the library is stranded: Sense Orientation [Antisense Orientation] (https://www.ebi.ac.uk/ols4/ontologies/ncit/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FNCIT_C63551)

Is NCIT an acceptable ontology?

I couldn't find ontology terms but the literature seems to just use 'non-stranded'.

Library preparation references: https://www.azenta.com/blog/stranded-versus-non-stranded-rna-seq https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1876-7

aclum commented 1 month ago

@turbomam let me know which modeling of example data you prefer and/or other suggestions. https://github.com/microbiomedata/nmdc-schema/pull/2113