miRTop / incubator

Where all ideas and discussions happen to lead to new repositories
5 stars 4 forks source link

GFF3::source | GFF3::type #13

Open lpantano opened 7 years ago

lpantano commented 7 years ago

Hi all again!

cc: @lpantano @gurgese @ThomasDesvignes @mhalushka @mlhack @keilbeck @BastianFromm @ivlachos @TJU-CMC

I propose to use the database used by the tool to put in the second column: source

I propose to use these labels for the type column (3rd):

Contribute with more options or any thoughts you have about it! thanks!

keilbeck commented 7 years ago

Column 3 needs to be a term form the Sequence Ontology. If the right terms are not there to describe your feature - we need to add it to the ontology There are 25 miRNA terms in the SO currently http://www.sequenceontology.org/browser/obob.cgi

lpantano commented 7 years ago

thanks that is great!.

I’ll take a look and if we need something new we’ll work with you to add them?

On Jun 13, 2017, at 11:33 AM, Karen EIlbeck notifications@github.com wrote:

Column 3 needs to be a term form the Sequence Ontology. If the right terms are not there to describe your feature - we need to add it to the ontology There are 25 miRNA terms in the SO currently http://www.sequenceontology.org/browser/obob.cgi http://www.sequenceontology.org/browser/obob.cgi — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/miRTop/incubator/issues/13#issuecomment-308156546, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HPm5ZryU_SK5bR0Wxo8_fEMegHLPks5sDqvegaJpZM4N4p5O.

ivlachos commented 7 years ago

I really like the idea of "reference" instead of canonical, since it's very close to reality. Kudos!

ThomasDesvignes commented 7 years ago

I am all for a "reference" miRNA instead of a "canonical" miRNA. In our TiG paper we were actually proposing the creation of a "RefSeq miRNA sequence" as an unchangeable standard, while the most expressed isomiR could change among sample/tissue/etc...

For column , the parent could then be "pre_miRNA" to match the Sequence Ontology. I think the SO database has most if not all covered as of now, except the isomiRs which may be considered as child of the RefSeq miRNA.

ThomasDesvignes commented 7 years ago

For column two ("source") do you mean putting: "miRBase_v.XX" or "MirGeneDB_v.X" or "personal annotation"? All that's fine with me. By experience (on fish) I usually do my own annotation and make it public with the publication. And for example both what I've done on Zebrafish and Spotted gar has never been incorporated in any database. How would we deal with that? I am thinking of putting my annotation files on a gitHub/Zenodo page (because I'll continue annotating more species and I know people won't dig into the supplemental files of my publication to retrieve an annotation...), so maybe in column 2 we could have something like "Zenobo_doi..."? Basically something traceable...

BastianFromm commented 7 years ago

We did zebrafish....we are doing about 20 more and hope to publish a reference for major metazoans this autumn..

On Jun 13, 2017 21:36, "Thomas Desvignes" notifications@github.com wrote:

For column two ("source") do you mean putting: "miRBase_v.XX" or "MirGeneDB_v.X" or "personal annotation"? All that's fine with me. By experience (on fish) I usually do my own annotation and make it public with the publication. And for example both what I've done on Zebrafish and Spotted gar has never been incorporated in any database. How would we deal with that? I am thinking of putting my annotation files on a gitHub/Zenodo page (because I'll continue annotating more species and I know people won't dig into the supplemental files of my publication to retrieve an annotation...), so maybe in column 2 we could have something like "Zenobo_doi..."? Basically something traceable...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/miRTop/incubator/issues/13#issuecomment-308224816, or mute the thread https://github.com/notifications/unsubscribe-auth/AaAi3-jKCwKH6l8WgT76ENmCLqqPABBBks5sDuSjgaJpZM4N4p5O .

ThomasDesvignes commented 7 years ago

That's awesome Bastian! How many more fish out of the 20? (I'm a fish person ;) ) However, the problem I have with the MirGeneDB is that the criteria for being in the DB are way too strict in regard to the way I study miRNAs and there are many non-canonical miRNAs that I want to continue studying because they are functional and that are not in MirGeneDB (cf previous discussions on canonical miRNAs..), so I guess that at least for my studies I'll continue using my own annotation files which will remain larger than what is in MirGeneDB and I need a way to make them publicly available, so that's why I ask for an alternative "source" of annotation. But we're moving away from the original question here...

lpantano commented 7 years ago

Thanks all for the discussion! and awesome we'll have zebrafish there.

Thomas, I think is ok, you can name it as you want, as far as it doesn't overlap with an official name.

I think we can ask for a line like this in the header of the file:

##source-ontology LINK TO DATABASE

or something like that to make sure is traceable.

PS:The idea to upload it to github it seems super good

lpantano commented 7 years ago

Hi @keilbeck and all, I looked at the SO. I think we need something like ref_miRNA and edit_miRNA or isomiR directly? Do you think is possible to add that to the database?

Let me know your thoughts.

keilbeck commented 7 years ago

Send me the definitions.

ThomasDesvignes commented 7 years ago

Hi Karen, I'm not sure we've reached a consensus yet on the "ref_miRNA" and "isomiR" definitions (I think isomiR is better than edit_miRNA btw), but in our paper together we proposed these definitions, which people can maybe comment and embellish:

lpantano commented 7 years ago

I am happy with those definitions, Thanks Thomas!

On Jun 22, 2017, at 11:35 AM, Thomas Desvignes notifications@github.com wrote:

Hi Karen, I'm not sure we've reached a consensus yet on the "ref_miRNA" and "isomiR" definitions (I think isomiR is better than edit_miRNA btw), but in our paper together we proposed these definitions, which people can maybe comment and embellish:

Ref_miRNA: A Ref_miRNA sequence is assigned at the creation of a new mature miRNA entry in a database. The Ref_miRNA sequence designation remains unchanged even if a different isomiR is later shown to be expressed at a higher level. A ref_miRNA can be produced by one or multiple pre-miRNA. IsomiRs: IsomiRs are all the bona fide variants of a mature product. IsomiRs should be connected to the Ref_miRNA it is most likely to be the variant of. Some isomiRs can be variations of one or multiple Ref_miRNA. (Directly taken from Fig.1 in the Trends in Genetics miRNA Nomenclature paper) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/miRTop/incubator/issues/13#issuecomment-310417294, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HFk5RRBi0GTqp0dfCj-M9lBmU_NDks5sGom9gaJpZM4N4p5O.

keilbeck commented 7 years ago

OK, just trying to get my head arounf this

Is a ref_miRNA a genomic feature or a transcript feature? I think isomiR is a transcript feature right?

ThomasDesvignes commented 7 years ago

From my point of view:

keilbeck commented 7 years ago

Brilliant. We will add these

nicoleruiz commented 7 years ago

SO:0002166 ref_miRNA and SO:0002167 isomiR have been added as children of miRNA.

mhalushka commented 7 years ago

Just to probe this further, what if a more abundant isomiR with a change at the 5' end of a ref_miRNA is encountered? This would change the seed sequence and could change the genes to which the miRNA could bind. Would you still keep the original ref_miRNA? Would you consider updating it with a "version change" or similar method?

ThomasDesvignes commented 7 years ago

That's a good thought! From my end I still consider it as an isomiR of the ref-miRNA and I usually call it a "seed-shifted isomiR". It will theoretically have a different function/targets due to having a different seed but it still is an alternative product of the same gene/pre-miRNA. Then if the ref_miRNA has actually been annotated with the "wrong" seed, that would probably need to be fixed I guess..., so all rely on the quality of the sequencing and analysis of the first dataset that leads to the annotation...

lpantano commented 7 years ago

It is indeed a good point Marc, and I agree with Thomas. I think that this can be applied to protein coding genes, where some isoforms will change the function depending on the exons that contains.

In this case if only mapped to that miRNA, but change the seed, I would continue using ref_miRNA as the reference. Please, remember than reference doesn’t mean anything, just something to compare to. It would change from database to databases, and among versions of the same database probably in the future.

In the case the variant map to more than one miRNA, then it would appear as isomiRs for both of the reference miRNA. (and saying is ambiguous in some attibute)

I think all these are fine as far as we keep all the information. I think for this reason, the issue opened discussing about the attribute is important. There, I mentioned one space to classify the isomiRs, this will help to use the GFF file and take all the isomiRs that change the seed region compare to the reference if anybody wants to focus on those cases to do more functional analysis.

Thanks for all the comments, I think we are improving a lot!

Please, chime in https://github.com/miRTop/incubator/issues/14 to talk about attributes.

thanks!

On Jun 22, 2017, at 2:28 PM, Thomas Desvignes notifications@github.com wrote:

That's a good thought! From my end I still consider it as an isomiR of the ref-miRNA and I usually call it a "seed-shifted isomiR". It will theoretically have a different function/targets due to having a different seed but it still is an alternative product of the same gene/pre-miRNA. Then if the ref_miRNA has actually been annotated with the "wrong" seed, that would probably need to be fixed I guess..., so all rely on the quality of the sequencing and analysis of the first dataset that leads to the annotation...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/miRTop/incubator/issues/13#issuecomment-310463792, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HJNwqfdxjcH-DW0hv4TY2NW6_-uGks5sGrJcgaJpZM4N4p5O.

ivlachos commented 7 years ago

I like the approach of the ref miRNA and isomiRs. It's a convention, it's clear and extensible. Many isomiRs can have different functionalities despite having the same seed (e.g. different localization) but certainly targeting with a 5' shift could be drastically affected. I agree that it's an isomiR compared to the reference and it's our job to find out what changes and what remains the same, as we are doing for genes. I also support to avoid "edited", since it brings ADAR to mind.

phillipeloher commented 6 years ago

From my perspective, a Ref_miRNA is an abstraction of a series of surrounding isomiRs. The problems with a ref_miRNA include (a) different ref endpoints between databases (e.g. mirbase vs mircarta) for the same locus (b) as folks already mentioned, the isomiR seeds (e.g. isomiRs with different 5p starting points) won't necessarily match the reference (c) the most abundant isomiR (which many ref miRNA annotations were populated) can differ between tissue state and cell type.

In many cases, the isomiR sequence corresponding exactly the ref_miRNA sequence (a 0|5p, 0|3p isomiR) is expressed.

Instead of making an isomiR a child of a ref_miRNA, if we made the Ref_miRNA an abstract_property of an isomiR sequence, it would place (I think rightfully) less emphasis on a somewhat arbitrary Ref_miRNA and more emphasis on the transcriptional products.

lpantano commented 6 years ago

Hi @phillipeloher,

thanks for the comment.

We don't consider isomiR to be a child of miRNA_Ref but a child of precursor.

It is true that the Variant attribute is relative to the miRNA_Ref, but I think this is the same problem than any other database where you get a reference somehow. I think having the universal ID can get the data mapped to any other database, and if we allow cross-mapping tool in the API: mirBase to mirGeneDB etc, then we solve this problem somehow, what do you think? I'll open an issue with this request.

It is true we can remove miRNA_ref from there and use the variants to be NA meaning that using that database there is no variants. I'll open a discussion for this specific issue. Thanks! great idea!