Open sjfleck opened 3 years ago
StringTie does not output any CDS features (only exon
features), which are needed by -x
-and -y
options of gffread.
You might want to run an ORF finder program (e.g. TransDecoder) in order to guess & assign likely CDS features to the StringTie output
Thank you for your quick feedback!
My goal is to create a genome guided transcriptome assembly using Stringtie and use gffread to convert the output GFT into a GFF3. I seem to be able to create the .gff3 file without a problem, but I want to see how complete it is using BUSCO's transcriptome or protein option. It seems like -y might be the best option for that, but I'm having a difficult time getting it to work. I also tried to use the -w and -x options, but only -w worked. Here are my commands:
hisat2-build -p 16 $REF $SAMPLE hisat2 --max-intronlen 20000 -p 16 --dta -x $SAMPLE -1 $READS1 -2 $READS2 -S $SAMPLE.sam samtools sort -@ 16 -o $SAMPLE.bam $SAMPLE.sam stringtie $BAM -o $OUT -p 16 gffread $OUT > $SAMPLE.gff3
At this point, I have a .gff3 that seems to be fine, but when I run:
gffread $SAMPLE.gff3 -g $FASTA -w exons.fa -x cds.fa -y tr_cds.fa
I get a fasta file with spliced exons for each transcript, but cds.fa and tr_cds.fa are both empty. If you have any guidence for getting this to work. Thank you and thank you for creating all these tools.