scottzijiezhang / MeRIPtools

8 stars 13 forks source link

problem in countReads #3

Open interdidi opened 4 years ago

interdidi commented 4 years ago

Dear author, I have met a problem in countReads.my code is as following: samplename = c("wt1","wt2") monster <- countReads(samplenames = samplename,gtf = "/gss1/home/xyc20191221/tomatofruit/wt1/ITAG.gtf",bamFolder = "/gss1/home/xyc20191221/tomatofruit/wt1",outputDir = "/gss1/home/xyc20191221/tomatofruit/wt1/fisherpeak",fragmentLength = 150,modification = "m6A",binSize = 50,threads = 28)

everything goes well until count reads in continuous bins.... ,then the error "Error in { : task 1 failed - "subscript is out of bounds""appeared. It quite bothers me.

By the way ,the file names is the same with the standard ones. such as "wt1.input.bam"/"wt1.m6A.bam".

waiting for your reply.

scottzijiezhang commented 4 years ago

Hello,

It looks like there might be something specific to the reference genome/gtf you used or your data caused the error. We tested this package on both human and mouse data and could go through well. It should work for all species, but there could be something different in your dataset or annotation file.

We will need to test on your data to know what specifically caused this error. Thanks

interdidi commented 4 years ago

  Thank you very much for your reply. my gtf file is converted from gff file by command" gffread ITAG4.1_gene_models.gff -T -o ITAG4.1.gtf ". and the gff file is downloaded from "ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG4.0_release/ ".   and the initial data is from a published article,their download sources are as follows.    I also checked the gtf format. At first,they showed like this: SL4.0ch00 maker exon 83863 84043 . + . transcript_id "mRNA:Solyc00g160260.1.1"; gene_id "gene:Solyc00g160260.1"; gene_name "Solyc00g160260.1";

   I tried to change the first line into “0,1,2,3....” or "chr00,chr01,chr02.."to let the program work. But they failled.

  Then I checked the bam file by "samtools view  -h wt1.m6A.bam | head - n 20 ".the screenshot is the attached file.

In case there are problems in my former data analysis ,so I also attached them here. $fastq-dump --split-3 SRR845${i}.sra

$extract_splice_sites.py ITAG4.1.gtf >tomato.ss $ extract_exons.py ITAG4.1.gtf >tomato.exon $ hisat2-build -p 28 --ss tomato.ss --exon tomato.exon ./S_lycopersicum_chromosomes.4.00.fa tomato_tran

$ hisat2 -p 28 --dta -x tomato_tran -k 1 --no-unal --summary-file srr4088.allign_summary -1 SRR8454088_1.fastq -2 SRR8454088_2.fastq | samtools view -bS | samtools sort -o wt2.m6A.bam

wishes for your reply . kind regards. yuchen.

 data download link

 https://sra-download.ncbi.nlm.nih.gov/traces/sra53/SRR/008255/SRR8454087  
 https://sra-download.ncbi.nlm.nih.gov/traces/sra53/SRR/008255/SRR8454088  
 https://sra-download.ncbi.nlm.nih.gov/traces/sra53/SRR/008255/SRR8454089  
 https://sra-download.ncbi.nlm.nih.gov/traces/sra75/SRR/008255/SRR8454090  
 https://sra-download.ncbi.nlm.nih.gov/traces/sra53/SRR/008255/SRR8454091  
 https://sra-download.ncbi.nlm.nih.gov/traces/sra75/SRR/008255/SRR8454092  

 

------------------ 原始邮件 ------------------ 发件人: "Zijie"<notifications@github.com>; 发送时间: 2020年5月8日(星期五) 凌晨2:50 收件人: "scottzijiezhang/MeRIPtools"<MeRIPtools@noreply.github.com>; 抄送: "亲爱的暮雨晨风"<1726841919@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [scottzijiezhang/MeRIPtools] problem in countReads (#3)

Hello,

It looks like there might be something specific to the reference genome/gtf you used or your data caused the error. We tested this package on both human and mouse data and could go through well. It should work for all species, but there could be something different in your dataset or annotation file.

We will need to test on your data to know what specifically caused this error. Thanks

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

interdidi commented 4 years ago

or can you send your example.gtf to me to let me check mine? thanx ------------------ 原始邮件 ------------------ 发件人: "Zijie"notifications@github.com 发送时间: 2020年5月8日(星期五) 凌晨2:50 收件人: "scottzijiezhang/MeRIPtools"MeRIPtools@noreply.github.com; 抄送: "interdidi"1726841919@qq.com;"Author"author@noreply.github.com; 主题: Re: [scottzijiezhang/MeRIPtools] problem in countReads (#3)

Hello,

It looks like there might be something specific to the reference genome/gtf you used or your data caused the error. We tested this package on both human and mouse data and could go through well. It should work for all species, but there could be something different in your dataset or annotation file.

We will need to test on your data to know what specifically caused this error. Thanks

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

scottzijiezhang commented 4 years ago

Hello, Sure. I downloaded reference genome for both human and mouse at https://support.illumina.com/sequencing/sequencing_software/igenome.html

I will try to test your data when I return to the lab after the pandemic get better controlled.

Thanks Zijie