rreybeyb / MoAIMS

Efficient software for detection of enriched regions of MeRIP-Seq
0 stars 1 forks source link

Error in getBinInfo #5

Open koriege opened 4 years ago

koriege commented 4 years ago

Hi, I stumbled over the Error

#Get bins info.
Error in if (strand == "+") { : argument is of length zero
Calls: moaims ... getBinInfo -> .getBinInfoObj -> lapply -> FUN -> .binByGene
In addition: Warning messages:
1: In dir.create(proj_name) : 'test' already exists
2: In dir.create("intermediate") : 'intermediate' already exists
Execution halted

This was my command.

moaims(sample_info_file='path/to/sample.info', gtf_file='path/to/gtf', strand_specific=1, is_paired=T, proj_name='test')

Please advice

Edit: I found the same error in the closed issue section. But there is no solution given. I am working on a transcriptome, i.e. each chromosome is a single gene and thus has only one gtf entry.

rreybeyb commented 4 years ago

Hi, I stumbled over the Error

#Get bins info.
Error in if (strand == "+") { : argument is of length zero
Calls: moaims ... getBinInfo -> .getBinInfoObj -> lapply -> FUN -> .binByGene
In addition: Warning messages:
1: In dir.create(proj_name) : 'test' already exists
2: In dir.create("intermediate") : 'intermediate' already exists
Execution halted

This was my command.

moaims(sample_info_file='path/to/sample.info', gtf_file='path/to/gtf', strand_specific=1, is_paired=T, proj_name='test')

Please advice

Edit: I found the same error in the closed issue section. But there is no solution given. I am working on a transcriptome, i.e. each chromosome is a single gene and thus has only one gtf entry.

Please check the following,

  1. the gtf file must include 'exon' in the third column.
  2. the chromosome names in the gtf are the same as that in the bam files How does each chromosome has only one gene?
koriege commented 4 years ago

I extracted all transcripts from gencode annotation, converted the fasta and gtf accordingly and mapped onto this transcriptome. I don't think the bam is messed up here. Since all other tools like IGV and bedtools etc. are working flawless, I do not see my mistake.

The fasta

>ENST00000000233
[ACGT]+
>ENST00000000412
[ACGT]+
>[...]

The gtf

ENST00000000233 ensembl_havana  gene    1   1103    .   +   .   gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana  5primeUTR   1   154 .   +   .   gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana  exon    1   221 .   +   .   gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; exon_number "1"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; exon_id "ENSE00001872691"; exon_version "1"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana  CDS 155 221 .   +   0   gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; exon_number "1"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; protein_id "ENSP00000000233"; protein_version "5"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana  CDS 222 302 .   +   2   gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; exon_number "2"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; protein_id "ENSP00000000233"; protein_version "5"; tag "basic"; transcript_support_level "1";
ENST00000000233 ensembl_havana  exon    222 302 .   +   .   gene_id "ENST00000000233"; gene_version "10"; transcript_id "ENST00000000233"; transcript_version "9"; exon_number "2"; gene_name "ARF5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "ARF5-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS34745"; exon_id "ENSE00003494180"; exon_version "1"; tag "basic"; transcript_support_level "1";
[...]

The bam

A00620:36:HKJ5WDMXX:1:2377:5385:2738    163 ENST00000000233 23  60  101=    =   158 235 GTCGGGAGGGCAGCGACGCGCGGAGGCGGCGGCGGAGCCTCCTCCTGCTGCTGCTGCGCCCCATCCCCCCGCGGCCGGCCAGTTCCAGCCCGCACCCCGCG   FFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   MD:Z:101    NH:i:1  HI:i:0  NM:i:0  YZ:Z:0  RG:Z:A1 PG:Z:MarkDuplicates
A00620:36:HKJ5WDMXX:1:2308:3043:26835   163 ENST00000000233 29  60  99= =   160 231 AGGGCAGCGACGCGCGGAGGCGGCGGCGGAGCCTCCTCCTGCTGCTGCTGCGCCCCATCCCCCCGCGGCCGGCCAGTTCCAGCCCGCACCCCGCGTCGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:99 NH:i:1  HI:i:0  NM:i:0  YZ:Z:0  RG:Z:A1 PG:Z:MarkDuplicates
A00620:36:HKJ5WDMXX:1:1149:14543:24267  163 ENST00000000233 31  60  100=    =   78  146 GGCAGCGACGCGCGGAGGCGGCGGCGGAGCCTCCTCCTGCTGCTGCTGCGCCCCATCCCCCCGCGGCCGGCCAGTTCCAGCCCGCACCCCGCGTCGGTGC    FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    MD:Z:100    NH:i:1  HI:i:0  NM:i:0  YZ:Z:0  RG:Z:A1 PG:Z:MarkDuplicates
[...]
rreybeyb commented 4 years ago

I wonder why no chromosome names in both gtf and bam files? Our tool works on standard chromosomes.

koriege commented 4 years ago

because, I am working on a transcriptome. You mean, if I would change my ENST identifiers to chr it should work?

rreybeyb commented 4 years ago

If the gtf file contains only one gene, it cannot work.

koriege commented 4 years ago

[...] means that I was presenting just the head of the file. For sure its muuuch larger (2333826 lines in the gtf to be precise)! Btw. [AGCT]+ is a regex and not the actual content of the fasta...