signalbash / how_are_we_stranded_here

Check strandedness of RNA-Seq fastq files
MIT License
115 stars 25 forks source link

Can't find transcript ids from fasta in bed #8

Open XMTian opened 2 years ago

XMTian commented 2 years ago

Hi,

I tried to use how_are_we_stranded_here to figure out the strands of my RNA-seq data.

Here are my commands:

check_strandedness --gtf Midas.annotation.v2.6.gtf -r1 Aast_14_trimmed_R1.fq.gz -r2 Aast_14_trimmed_R2.fq.gz -fa Midas.2.6.transcripts.fa

Here is how the gtf file looks like.

head Midas.annotation.v2.6.gtf

1 funannotate transcript 6208 30525 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 6208 6471 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 7462 7555 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 16749 16855 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 17474 17511 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 17707 17785 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 17863 17957 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 18669 18714 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 18793 18939 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001"; 1 funannotate exon 19209 19291 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";

How Midas.2.6.transcripts.fa looks like.

>Midas_000001-T1 AAAAAAAAAAAAAAAAAAAAAAATTCGGCTTATACTGCAGCTTAGAAGTCGTGCTCAGAGGAACGAGCTTTATCCCACTGCATTTCGGGCAGGTTAGAAGTCGTGCCGAGTAACGAAAAAAAAGAGGGAATTCGTGTTCATGAGCACGAATCAATAGATTAAATTTCGTT >Midas_000001-T2 CTGACGAAGTGTCCTGATGCGATCCACGTCCCATTTTTACGACGGAGACCGCTTGCTGACGGCAGCCGGTTGCCATAACGTCCCTCCAGCAACATGTCCTCGTCGGGGGCGCAGCGTGTCGGTCCCGCGGCGGCTTTCCCAGAGAACCAGGGCGGTGCGGCGGCGGCAGG >Midas_000002-T1 CATAACTCAAAAAACCATCAGTGAAGTCACAGCTGCTAGTCTTTGGAGCAGATCTCAATAAAAACCACAGCTTGTCCTCATCTCATGGCCTCATCCTCTCTCTGCAGCTGCAGGCGTGGTGGACTTGAACCTGTGCAACATCCGGGACATGGAGGTCATCGAGCTGAGCA >Midas_000003-T1 GTGTCATTCACAATAAAAATCAGAGCAGCTTGGACTTGCATCCAAACAGAACACCATCAAAGTAGAAGAAGCACCAGATGACGACCTGTAGCTTTTCAAACACCTGAAGTTCAGCTTTCTGACAGGAGCCATCAGGCCTGAACCTCCAGCACTGCCTAAGGAGCCTTCAA >Midas_000004-T1 ATGACAGAGAGAGATAGACTAGAGCTGAGGAGACCGCCATGGAGAGAGAGAGGACAGAGAAGAGCGAGACAGAGAGCGATACATACAGAGAGAGAGACCGCCAGAGAGAGAGACCCCGACCGAGAGAGCGGCGACGACCGAGAGGCGCGCACAACAGCACATAGAGAGAG >Midas_000005-T1 CTGGACTGGACCAGAGCGACATCATGAAGCTGCTGAGACACGGCATCTACACTCTGCTGGTAATTTGCAGTGTATTGTGGGCTTCTTGCTCCAAGGTTAAAGCTGAATCATCTCCTGGATGTGACACCACCTTGACGTTCTCCTCAGAATTGAGCACCTTGACTGAAGGA >Midas_000006-T1 GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGCGAGCGAGCGAGCGAGCACGCGCACGCAGCAGGTCAAACTCGCTGACAACAGGTCCAAAGACCCGGACAGAAAAACAATAGAAGAGAGTAGAAAATGAGTGGTGCAG >Midas_000007-T1 GTAAGCCAGCAGTCTGAGAGTGAAGTGCTCTGTTGGGGTGATACAGTACTATGAGGTCTTTGAAATAAGATGGGGCCTGATTATTCAAGACCTTTTTGCATTTGTTCTGGGACCCCTGTGTCCTCATGCATGGAAATACAGAGGCGAGGGGAGAAGCAGCACCACCACCT

But I got this error, I checked the transcript ids, there are all in the fasta file.

Results stored in: stranded_test_Aast_14_trimmed_R1_2 converting gtf to bed Checking if fasta headers and bed file transcript_ids match... Can't find transcript ids from Midas.2.6.transcripts.fa in stranded_test_Aast_14_trimmed_R1_2/Midas.annotation.v2.6.bed Trying to converting fasta header format to match transcript ids to the BED file... Can't find any of the first 10 BED transcript_ids in fasta file... Check that these match

Could you help me to figure it out?

Best, Xiaomeng

LauraPugh commented 2 years ago

Hi,

I get the same error. Did you find a solution?

OlivierBakker commented 2 years ago

Hi,

Just bumping this as I am also running into the same issue and there doesn't seem to be a solution. I re-generated the cdna fasta based on the Ensembl 99 gtf file. I also manually checked the first 10 records, and they are definitely in there. Would love an update on this.

signalbash commented 2 years ago

I think this was an issue with formatting. I've edited the code to strip whitespace and it tentatively works locally on the Midas transcriptome/gtf examples you gave. https://github.com/signalbash/how_are_we_stranded_here/commit/0fedf9245c3bbbcdf670e2b374196ca5d40cab25

KristinaGagalova commented 1 year ago

Hi, I am using the pip version of the tool and I am having the same issue. Would be possible to update this bug fix in all versions? The commit that you are pointing to works well