tderrien / FEELnc

FEELnc : FlExible Extraction of LncRNA
GNU General Public License v3.0
82 stars 28 forks source link

Using Trinity FASTA file as input for FEELnc_filter.pl #31

Closed CristinaOsu closed 6 years ago

CristinaOsu commented 6 years ago

Dear all,

I am trying to run the entire pipeline of FEELnc using Trinity transcript FASTA file as input (instead of cufflinks/stringtie transcripts.GTF). However, I only found possible to run FEELnc_codpot.pl with this type of input but not FEELnc_filter.pl. Is that correct? why is not available the filter step for Trinity transcript FASTA file input?

Thank you so much in advance,

Cristina Osuna

tderrien commented 6 years ago

Dear Cristina,

Yes, you are correct. Using FASTA files as input, FEELnc cannot filter transcripts based on the number of exons for instance (1st module filter.pl) nor classify them wrt closest genes (3rd modules classifier.pl). As stated in the introduction, the 1st and 3rd modules require a reference genome while people using Trinity generally don't have reference genome. HTH

Thomas

CristinaOsu commented 6 years ago

Dear Thomas,

Thanks a lot for your quick reply! I do have a reference genome but I ran the Genome-guided Trinity De novo Transcriptome Assembly (https://github.com/trinityrnaseq/trinityrnaseq/wiki/Genome-Guided-Trinity-Transcriptome-Assembly). I used these transcripts to update my protein-coding gene annotation with PASA pipeline (https://github.com/PASApipeline/PASApipeline/wiki/PASA_genome_annotation). I was hoping to use now FEELnc pipeline to include lncRNA annotation as well, but I found weird not being able to filter my transcript FASTA file prior calculating coding potential. How would you recommend proceeding then? I was thinking of mapping my TRINITY transcripts against my genome using gmap or so, and then using the resulting gtf file as input... but I am not sure if this would be a good idea or if it is better to start over using e.g Stringtie. Thank you so much in advance.

All my best,

Cristina Osuna

tderrien commented 6 years ago

Yes, in this case, it would be better to map your FASTA assembled transcripts (with gmap or minimap...) onto you reference genome so that you would be able to use the entire FEELnc pipeline. Actually, using mapping-first versus assembly-first strategy really depends on the quality of your genome assembly.

Best,

Thomas