Open jkreplak opened 2 years ago
Hello @jkreplak,
Thank you so much for your interest in finder
and thank you for reporting this issue. I have not encountered this issue before and I need to think of a solution. I will keep you posted.
Thank you.
Hello,
Could you please repost the question? For some reason, I cannot find it on Github.
Thank you. Sagnik Banerjee (Ph.D.) Iowa State University Applied Bioinformatics Scientist Bristol Myers Squibb
*"The moment I have realized God sitting in the temple of every human body, the moment I stand in reverence before every human being and see God in him
ᐧ
On Thu, Mar 24, 2022 at 12:58 PM jkreplak @.***> wrote:
Hi, I've managed to create with agat my own exon-based transcripts fasta and comment gffread to force finder to use it. It's finished, so that's a really good news ! However, I'm surprised by the results. Why does the CDS stop before the stop codon base and not at the end of it in the gtf ?
Is that a bug on my side ? If not, the official ontology for CDS in the SO include the stop codon and it should be added.
— Reply to this email directly, view it on GitHub https://github.com/sagnikbanerjee15/Finder/issues/53#issuecomment-1077892827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFS7BOMHPJFF4FISDAJHSQ3VBSUMXANCNFSM5RQPMPFQ . You are receiving this because you were assigned.Message ID: @.***>
Hello, So I was able to modify findCDS command to use another tool (agat) to create fasta and it worked till the end. There is a few really large introns ( > 6Mb) caused by bad UTR splicing in some isoforms. I'll remove them by hand. I was just surprised by the final gtf format. As the CDS doesn't include stop codon and there are no stop codons lines for each transcript, tools (gffread, agat, jcvi...) are calling them 3' partial. It's difficult to work with the way I do it usually.
Hello @jkreplak,
Thank you for reporting this issue. I will create a program that will you can use to filter out certain genes and transcripts. Could you please post a few examples of the output where stop codons are missing? Since the result is being output in GTF mode, there will not be any STOP codon annotated. But later versions of the software will have the option to request for such annotations as well.
Thank you.
Hi, My Finder crashed during FindCDS at the codan steps. Codan throw an error about duplicate key :
After checking this error, i found in the gtf file combined_split_transcripts_with_bad_SJ_redundancy_removed.gtf around 650 transcripts with ultra-long introns :
When I relaunch gffread, i can see that the software is creating at least two fasta entry for this transcript, explaining the duplicate message of codan. After checking the web, it seems that gffread has an intron limit size.
How can I go around that to finish the pipeline ?
Thanks, Jonathan