Question about transcript classification

tderrien / FEELnc

FEELnc : FlExible Extraction of LncRNA

GNU General Public License v3.0

79 stars 28 forks source link

Question about transcript classification #47

Closed cc-prolix closed 2 years ago

cc-prolix commented 3 years ago

Hello,

I am using the FEELnc_classifier.pl script to classify lncRNAs based on their genomic localization. I am especially interested in natural antisense transcripts: In this case lncRNAs in antisense direction with at least a partial complementarity to corresponding (protein-coding or non-coding) transcripts in opposite direction.

I am wondering about what subset of the classifier output would suit my needs best: I would choose transcripts tagged "antisense", a "genic" type and "exonic" location to get at least partial complementarity? Can you recommend anything?

Thank you very much for your help!

tderrien commented 3 years ago

Hello,

Yes you're right. These tags involve an overlap at the exonic level between the antisense lncRNAs and the sense mRNAs.

Note that if gene boundaries are not well annotated, intergenic classes with convergent or divergent subclasses could also be of interest depending on the distance between the 2 elements. Hope this helps.

cc-prolix commented 3 years ago

Thank you very much for your quick response!

I have one more question: So far I had only considered transcripts with the tag "isBest = 1" in the classifier output file. What would you recommend? When would I choose an alternative classifiaction result to the one tagged "isBest = 1"?

Thank you very much!

tderrien commented 3 years ago

Sorry for the late reply...

This depends on your biological question. For intergenic lncRNAs (lincRNAs), the tag "isBest = 1" is attributed to the lncRNA which is the closest to any other genes. You may want to extract all neighboring genes (not only the closest) by filtering on a certain window around the lncRNAs (e.g 1Mb) using the distance column (8th).

Hope this helps

tan5251 commented 3 years ago

Hello, I am also using the FEELnc_classifier.pl script to classify lncRNAs based on their genomic localization. I find some lncRNAs can not be annotated although i have increased the option "--maxwindow ". Any suggestions to annotate these lncRNAs ? What should i define the location of these lncRNAs ?

tderrien commented 3 years ago

Hi,

We also observed such cases. Are these lncRNAs localized on contigs/unassembled chromosomes (without any other annotated genes located on these "chromosomes")?

tan5251 commented 3 years ago

Yes, these lncRNAs all localized in the unassembled chromosomes. I have solved the problem by using the GTF files with unassembled chromosomes. Thanks you very much for your reply.