Open koriege opened 4 years ago
Hi, I stumbled over the Error
#Get bins info. Error in if (strand == "+") { : argument is of length zero Calls: moaims ... getBinInfo -> .getBinInfoObj -> lapply -> FUN -> .binByGene In addition: Warning messages: 1: In dir.create(proj_name) : 'test' already exists 2: In dir.create("intermediate") : 'intermediate' already exists Execution halted
This was my command.
moaims(sample_info_file='path/to/sample.info', gtf_file='path/to/gtf', strand_specific=1, is_paired=T, proj_name='test')
Please advice
Edit: I found the same error in the closed issue section. But there is no solution given. I am working on a transcriptome, i.e. each chromosome is a single gene and thus has only one gtf entry.
Please check the following,
I extracted all transcripts from gencode annotation, converted the fasta and gtf accordingly and mapped onto this transcriptome. I don't think the bam is messed up here. Since all other tools like IGV and bedtools etc. are working flawless, I do not see my mistake.
The fasta
>ENST00000000233
[ACGT]+
>ENST00000000412
[ACGT]+
>[...]
The gtf
ENST00000000233 ensembl_havana gene 1 1103 . + . gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana 5primeUTR 1 154 . + . gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana exon 1 221 . + . gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; exon_number "1"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; exon_id "ENSE00001872691"; exon_version "1"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana CDS 155 221 . + 0 gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; exon_number "1"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; protein_id "ENSP00000000233"; protein_version "5"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana CDS 222 302 . + 2 gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; exon_number "2"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; protein_id "ENSP00000000233"; protein_version "5"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana exon 222 302 . + . gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; exon_number "2"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; exon_id "ENSE00003494180"; exon_version "1"; tag "basic"; transcript_support_level "1";
[...]
The bam
A00620:36:HKJ5WDMXX:1:2377:5385:2738 163 ENST00000000233 23 60 101= = 158 235 GTCGGGAGGGCAGCGACGCGCGGAGGCGGCGGCGGAGCCTCCTCCTGCTGCTGCTGCGCCCCATCCCCCCGCGGCCGGCCAGTTCCAGCCCGCACCCCGCG FFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:101 NH:i:1 HI:i:0 NM:i:0 YZ:Z:0 RG:Z:A1 PG:Z:MarkDuplicates
A00620:36:HKJ5WDMXX:1:2308:3043:26835 163 ENST00000000233 29 60 99= = 160 231 AGGGCAGCGACGCGCGGAGGCGGCGGCGGAGCCTCCTCCTGCTGCTGCTGCGCCCCATCCCCCCGCGGCCGGCCAGTTCCAGCCCGCACCCCGCGTCGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:99 NH:i:1 HI:i:0 NM:i:0 YZ:Z:0 RG:Z:A1 PG:Z:MarkDuplicates
A00620:36:HKJ5WDMXX:1:1149:14543:24267 163 ENST00000000233 31 60 100= = 78 146 GGCAGCGACGCGCGGAGGCGGCGGCGGAGCCTCCTCCTGCTGCTGCTGCGCCCCATCCCCCCGCGGCCGGCCAGTTCCAGCCCGCACCCCGCGTCGGTGC FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:100 NH:i:1 HI:i:0 NM:i:0 YZ:Z:0 RG:Z:A1 PG:Z:MarkDuplicates
[...]
I wonder why no chromosome names in both gtf and bam files? Our tool works on standard chromosomes.
because, I am working on a transcriptome. You mean, if I would change my ENST
identifiers to chr
it should work?
If the gtf file contains only one gene, it cannot work.
[...]
means that I was presenting just the head of the file. For sure its muuuch larger (2333826 lines in the gtf to be precise)!
Btw. [AGCT]+
is a regex and not the actual content of the fasta...
Hi, I stumbled over the Error
This was my command.
Please advice
Edit: I found the same error in the closed issue section. But there is no solution given. I am working on a transcriptome, i.e. each chromosome is a single gene and thus has only one gtf entry.