zhqingit / giremi

GIREMI is a method that can identify RNA editing sites using one RNA-seq data set without requiring genome sequence data.
42 stars 15 forks source link

N's in Output File #4

Open doughertyr opened 8 years ago

doughertyr commented 8 years ago

Dr. Zhang,

I'm Dr. Taylor's student at CHOP. We had been communicating via email, but your ucla address seems to be down.

I've run giremi on several samples, and I'm running into a problem. The program stops short of generating the 'ifRNAE' column. This is likely related to the fact that in the output file, there are only N's in the columns refB, upB, downB and majorB, even though the corresponding positions are not N's. Do you know what might be causing this issue?

The list of SNV's I made looks like this:

chr1 87278 87279 Inte 0 #

chr1 87286 87287 Inte 0 #

chr1 87313 87314 Inte 0 #

chr1 91013 91014 Inte 0 #

chr1 91038 91039 Inte 0 #

chr1 91081 91082 Inte 0 #

chr1 91094 91095 Inte 0 #

chr1 128867 128868 Inte 0 #

chr1 128868 128869 Inte 0 #

chr1 233473 233474 Inte 0 #

Thanks, Bobby

zhqingit commented 8 years ago

Hi Bobby,

Yes, my ucla email doesn't work. Could you send me your full list of the SNV and the output file for my checking? Thanks.

Best, Qing

2016-02-11 14:07 GMT-07:00 doughertyr notifications@github.com:

Dr. Zhang,

I'm Dr. Taylor's student at CHOP. We had been communicating via email, but your ucla address seems to be down.

I've run giremi on several samples, and I'm running into a problem. The program stops short of generating the 'ifRNAE' column. This is likely related to the fact that in the output file, there are only N's in the columns refB, upB, downB and majorB, even though the corresponding positions are not N's. Do you know what might be causing this issue?

The list of SNV's I made looks like this:

chr1 87278 87279 Inte 0 #

chr1 87286 87287 Inte 0 #

chr1 87313 87314 Inte 0 #

chr1 91013 91014 Inte 0 #

chr1 91038 91039 Inte 0 #

chr1 91081 91082 Inte 0 #

chr1 91094 91095 Inte 0 #

chr1 128867 128868 Inte 0 #

chr1 128868 128869 Inte 0 #

chr1 233473 233474 Inte 0 #

Thanks, Bobby

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4.

doughertyr commented 8 years ago

Qing,

The output file and corresponding list of SNVs are attached.

Thanks, Bobby

SNV_list.txt giremi_output.txt

zhqingit commented 8 years ago

Hi Bobby,

I guess your fasta index file might not compatible to your fasta file. Could you try samtools to generate a new index file based on your fasta file?

Best, Qing

2016-02-12 11:52 GMT-07:00 doughertyr notifications@github.com:

Qing,

The output file and corresponding list of SNVs are attached.

Thanks, Bobby

SNV_list.txt https://github.com/zhqingit/giremi/files/128477/SNV_list.txt giremi_output.txt https://github.com/zhqingit/giremi/files/128478/giremi_output.txt

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-183448613.

doughertyr commented 8 years ago

Qing,

I tried again after re-indexing my fasta file, and I get identical errors. There must be a different problem.

Thanks, Bobby

zhqingit commented 8 years ago

Hi Bobby,

Can you check if the chromosome name in your fasta file is identical to that in your snv list file?

Best, Qing On Feb 16, 2016 09:40, "doughertyr" notifications@github.com wrote:

Qing,

I tried again after re-indexing my fasta file, and I get identical errors. There must be a different problem.

Thanks, Bobby

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-184760137.

doughertyr commented 8 years ago

Qing,

They are both the same; just the number, no 'chr'.

Thanks, Bobby

zhqingit commented 8 years ago

Hi Bobby,

I think that's the reason. Giremi find the exact identical chromosome name.

Best, Qing

Qing,

They are both the same; just the number, no 'chr'.

Thanks, Bobby

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-184765131.

Balexander230 commented 8 years ago

hi Qing,

I'm wondering why would the chromosome name need to be different between the fasta file and the SNV list file, could you maybe give an example. I'm trying to get GIREMI to work but it seems to take a long time and I seem to have the same names in both files, i thought this was a good thing. Sorry if I'm barging in on your discussion

Greets, Alex

doughertyr commented 8 years ago

Qing,

When I checked again I found that my snv list had chromosomes labelled with the 'chr' prefix while my reference did not. I removed the 'chr' prefix from my snv list file, which fixed my N's problem! However, giremi still does not generate the crucial last column, which halts execution.

Alex,

In my experience, when giremi works it doesn't take that long. If you don't give it what it wants, though, it may seem like its running when it actually isn't. Would you mind showing your command line?

zhqingit commented 8 years ago

Hi Bobby,

Could you send me your output file? Then I can take a look and find the reason.

Best, Qing

2016-02-17 9:40 GMT-07:00 doughertyr notifications@github.com:

Qing,

When I checked again I found that my snv list had chromosomes labelled with the 'chr' prefix while my reference did not. I removed the 'chr' prefix from my snv list file, which fixed my N's problem! However, giremi still does not generate the crucial last column, which halts execution.

Alex,

In my experience, when giremi works it doesn't take that long. If you don't give it what it wants, though, it may seem like its running when it actually isn't. Would you mind showing your command line?

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-185289465.

zhqingit commented 8 years ago

Hi Alex,

Like Bobby said, giremi doesn't need long time. Could you show me your command?

Best, Qing

2016-02-17 7:43 GMT-07:00 Balexander230 notifications@github.com:

hi Qing,

I'm wondering why would the chromosome name need to be different between the fasta file and the SNV list file, could you maybe give an example. I'm trying to get GIREMI to work but it seems to take a long time and I seem to have the same names in both files, i thought this was a good thing. Sorry if I'm barging in on your discussion

Greets, Alex

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-185234500.

Balexander230 commented 8 years ago

hi Qing and Bobby,

first of all thank you for responding so quickly. the command I'm using is: ./giremi -f ~/human_g1k_v37.fasta\ -l ~/good_format.txt \ -o ~/giremi.table.out \ ~/My_file.bam

-f fasta file: head_-3_human_g1k_v37.txt -l SNV file: head_good_format.txt

the My_file.bam I created from a fasta file using hisat and SAMtools, these are the first few lines of that fasta file: firstlinesfastafile.txt

I hope this info is enough for you to help me. Thank you for your time.

Greetings, Alex

doughertyr commented 8 years ago

Qing,

Here are the two output files

giremi_output.res.Rout.txt giremi_output.res.txt

Thanks, Bobby

doughertyr commented 8 years ago

Alex,

I noticed two things that may be problems.

First, in you SNV file, it should be "Inte", not "inte". That may be an issue.

Second, in your ref file you may want to try running it with just the number of the chromosome, not the whole ">1 dna:chromosome chromosome:GRCh37:1:1:249250621:1". In my ref, that's just '>1'. That may also be causing a problem.

Bobby

zhqingit commented 8 years ago

Hi Bobby,

In your SNV list file, actually the site (chr1:11085005) is homozygous, which cause the stopping. I think you can remove these homozygous SNVs in your list.

Best, Qing

2016-02-19 12:51 GMT-07:00 doughertyr notifications@github.com:

Alex,

I noticed two things that may be problems.

First, in you SNV file, it should be "Inte", not "inte". That may be an issue.

Second, in your ref file you may want to try running it with just the number of the chromosome, not the whole ">1 dna:chromosome chromosome:GRCh37:1:1:249250621:1". In my ref, that's just '>1'. That may also be causing a problem.

Bobby

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-186375940.

zhqingit commented 8 years ago

Hi Alex,

Yes, please check the issues Bobby has pointed out. Another issue is the 4th column in the SNV list file should be the gene harboring the SNV not the SNP label.

Best, Qing

2016-02-18 7:45 GMT-07:00 Balexander230 notifications@github.com:

hi Qing and Bobby,

first of all thank you for responding so quickly. the command I'm using is: ./giremi -f ~/human_g1k_v37.fasta\ -l ~/good_format.txt \ -o ~/giremi.table.out \ ~/My_file.bam

-f fasta file: head_-3_human_g1kv37.txt https://github.com/zhqingit/giremi/files/136316/head-3_human_g1k_v37.txt -l SNV file: head_good_format.txt https://github.com/zhqingit/giremi/files/136320/head_good_format.txt

the My_file.bam I created from a fasta file using hisat and SAMtools, these are the first few lines of that fasta file: firstlinesfastafile.txt https://github.com/zhqingit/giremi/files/136337/firstlinesfastafile.txt

I hope this info is enough for you to help me. Thank you for your time.

Greetings, Alex

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-185750955.

Balexander230 commented 8 years ago

Hi Qing and Bobby,

thank you for pointing these errors out to me. I have corrected them but Giremi still seems to take a long time (>30hours and counting), which leads me to believe there is still an error somewhere.

Could you maybe send some sample input data so i can compare it to my data? This way i can at least see that Giremi can run given the proper input.

I'm thinking that there must be some small error in my process that's hard to detect without side to side comparison. I realize that this might be a tall order to ask but I'm running out of ideas.

Greetings, Alex

zhqingit commented 8 years ago

Hi Alex,

Attached please find an example of the input file. But obviously it can't match your bam file. So I think you can just take a look at the format of the input file.

Best, Qing

2016-02-22 2:01 GMT-07:00 Balexander230 notifications@github.com:

Hi Qing and Bobby,

thank you for pointing these errors out to me. I have corrected them but Giremi still seems to take a long time (>30hours), which leads me to believe there is still an error somewhere.

Could you maybe send some sample input data so i can compare it to my data? This way i can at least see that Giremi can run given the proper input.

I'm thinking that there must be some small error in my process that's hard to detect without side to side comparison. I realize that this might be a tall order to ask but I'm running out of ideas.

Greetings, Alex

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-187080362.

Balexander230 commented 8 years ago

Hi Qing,

thank you for your continued help, an example input file might be just what i need even if it would only rule out an error in that part of my process. However the file you mentioned in your comment seems to be missing.

Greetings, Alex

doughertyr commented 8 years ago

Qing,

I've been removing the sites that cause giremi to halt then running giremi again. It seems that every line I remove is a homozygous site. However, there are also many sites that would be homozygous by the same criteria that do not cause giremi to halt. For example, the first position that caused giremi to halt was 11085004, which looks like this in the vcf file I used to make my snp list:

1 11085004 . A ATGTTT

But this site comes after a site that look like this:

1 2302665 . C CACAACCACA

A different site that caused giremi to halt was 17718671, which in my vcf file looks like this:

1 17718671 . AG A

But this comes after many similar sites that didn't cause problems such as:

1 10365296 . AC A 1 12073251 . GA G 1 15788766 . TA T

Am I looking at this the wrong way?

Thanks, Bobby

zhqingit commented 8 years ago

Hi Alex,

I guess the file is too big to be shown in github. Attached please find a reduced file. Let me know freely if you have more questions.

Best, Qing

2016-02-23 5:07 GMT-07:00 Balexander230 notifications@github.com:

Hi Qing,

thank you for your continued help, an example input file might be just what i need even if it would only rule out an error in that part of my process. However the file you mentioned in your comment seems to be missing.

Greetings, Alex

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-187674403.

zhqingit commented 8 years ago

Hi Bobby,

I can't find these sites in the file you sent to me. And so far, giremi can't handle the SNV site with multiple nucleotides. Could you send me the output file, then I can check it.

Best, Qing

2016-02-24 13:37 GMT-07:00 doughertyr notifications@github.com:

Qing,

I've been removing the sites that cause giremi to halt then running giremi again. It seems that every line I remove is a homozygous site. However, there are also many sites that would be homozygous by the same criteria that do not cause giremi to halt. For example, the first position that caused giremi to halt was 11085004, which looks like this in the vcf file I used to make my snp list:

1 11085004 . A ATGTTT

But this site comes after a site that look like this:

1 2302665 . C CACAACCACA

A different site that caused giremi to halt was 17718671, which in my vcf file looks like this:

1 17718671 . AG A

But this comes after many similar sites that didn't cause problems such as:

1 10365296 . AC A 1 12073251 . GA G 1 15788766 . TA T

Am I looking at this the wrong way?

Thanks, Bobby

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-188445471.

Balexander230 commented 8 years ago

Hi Qing,

the file you are attaching/sending in your second attempt is still not there it seems.

Greeting, Alex

doughertyr commented 8 years ago

Qing,

My output files are already attached to the this thread.

Thanks, Bobby

zhqingit commented 8 years ago

Hi Alex,

How about this time? GM8.mis.s.txt

Best, Qing

zhqingit commented 8 years ago

Hi Bobby,

I mean I can't find the examples you showed in the previous threat in your attached file. Examples are listed below: 1 2302665 . C CACAACCACA 1 17718671 . AG A 1 10365296 . AC A 1 12073251 . GA G 1 15788766 . TA T

Best, Qing

doughertyr commented 8 years ago

I removed all SNV sites with multiple nucleotides and homozygous sites, which may have fixed one problem, but now I'm getting an error identical to the issue #6 post.

Bobby

zhqingit commented 8 years ago

Hi Boby,

You only used the chromosome 22 as the test, but there are few SNVs in this chromosome, which caused no MIs could be calculated. So please try all the chromosomes.

Best, Qing

doughertyr commented 8 years ago

Qing,

Trying the same dataset with all chromosomes worked! I was finally able to run giremi completely without any errors. However, I've since tried doing to the same thing to other files in my dataset, and I'm getting the same 'does not have 24 elements' error in my .res.Rout file. Like before, the snv list was made excluding both multiple snv sites and homozygotes, which solved my problem before.

The files that giremi completed an analysis with begin with '419'. The files that giremi halted its analysis with begin with '968'.

I greatly appreciate all the help you've given me thus far, hopefully soon I won't have to bother you anymore!

Thanks, Bobby

[968AnnotateSNVs_output.txt] 968giremi_output.txt 968giremi_output.res.Rout.txt

419AnnotateSNVs_output.txt 419giremi_output.res.res.txt 419giremi_output.res.Rout.txt 419giremi_output.res.txt

Balexander230 commented 8 years ago

Hi Qing/Bobby,

I have been working on getting my stuff in better order for Giremi and I'm pretty close I think. but now I'm running into a similar issue that Bobby had which was the same issue as mentioned in the #6 issue. From my Rout file:

dat.pos <- cbind(dat.pos[,c("upstream_1base","downstream_1base")],abs(dat.pos[,"major_ratio"]-dat.pos[,"estimated_allelic_ratio"]),1) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 Calls: cbind -> cbind -> data.frame Execution halted

My output files: table.out.Rout.txt table.out.txt

It seem sthat Bobby solved the issue by using the entire genome file, but I am already using the entire file so no luck there it seems. my SNV file test.txt My reference is the ucsc hg19 with the chromosome headers being only the ">(chromNumber)". Could you have a look, maybe spot something I'm doing wrong?

Greets, Alex

zhqingit commented 8 years ago

Hi Bobby,

Based on the Rout file, I guess the position chr16:21994960 is 'N' in your fasta file. You can check the fasta file or remove this position in your list.

Best, Qing

2016-03-03 12:24 GMT-07:00 doughertyr notifications@github.com:

Qing,

Trying the same dataset with all chromosomes worked! I was finally able to run giremi completely without any errors. However, I've since tried doing to the same thing to other files in my dataset, and I'm getting the same 'does not have 24 elements' error in my .res.Rout file. Like before, the snv list was made excluding both multiple snv sites and homozygotes, which solved my problem before.

The files that giremi completed an analysis with begin with '419'. The files that giremi halted its analysis with begin with '968'.

I greatly appreciate all the help you've given me thus far, hopefully soon I won't have to bother you anymore!

Thanks, Bobby

[968AnnotateSNVs_output.txt] 968giremi_output.txt https://github.com/zhqingit/giremi/files/157508/968giremi_output.txt 968giremi_output.res.Rout.txt https://github.com/zhqingit/giremi/files/157509/968giremi_output.res.Rout.txt

419AnnotateSNVs_output.txt https://github.com/zhqingit/giremi/files/157524/419AnnotateSNVs_output.txt 419giremi_output.res.res.txt https://github.com/zhqingit/giremi/files/157526/419giremi_output.res.res.txt 419giremi_output.res.Rout.txt https://github.com/zhqingit/giremi/files/157527/419giremi_output.res.Rout.txt 419giremi_output.res.txt https://github.com/zhqingit/giremi/files/157528/419giremi_output.res.txt

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-191924842.

zhqingit commented 8 years ago

Hi Alex,

I noticed your SNVs list includes many homozygous sites, which might cause the problem. Please remove them and try it again.

Best, Qing

2016-03-06 13:19 GMT-07:00 Balexander230 notifications@github.com:

Hi Qing/Bobby,

I have been working on getting my stuff in better order for Giremi and I'm pretty close I think. but now I'm running into a similar issue that Bobby had which was the same issue as mentioned in the #6 https://github.com/zhqingit/giremi/issues/6 issue. From my Rout file:

dat.pos <- cbind(dat.pos[,c("upstream_1base","downstream_1base")],abs(dat.pos[,"major_ratio"]-dat.pos[,"estimated_allelic_ratio"]),1) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 Calls: cbind -> cbind -> data.frame Execution halted

My output files: table.out.Rout.txt https://github.com/zhqingit/giremi/files/160541/table.out.Rout.txt table.out.txt https://github.com/zhqingit/giremi/files/160542/table.out.txt

It seem sthat Bobby solved the issue by using the entire genome file, but I am already using the entire file so no luck there it seems. my SNV file test.txt https://github.com/zhqingit/giremi/files/160543/test.txt My reference is the ucsc hg19 with the chromosome headers being only the ">(chromNumber)". Could you have a look, maybe spot something I'm doing wrong?

Greets, Alex

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-192981413.

Balexander230 commented 8 years ago

Hi Qing,

I think I have removed them but am still getting an error. from my Rout file:

dat.pos <- cbind(dat.pos[,c("upstream_1base","downstream_1base")],abs(dat.pos[,"major_ratio"]-dat.pos[,"estimated_allelic_ratio"]),1) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 Calls: cbind -> cbind -> data.frame Execution halted

my new SNV_list file extra.txt the rest of my files remained the same. I have also found an error which may have been there before but went unnoticed by me the error goes like this: [mpileup] 1 samples in 1 input files [fai_fetch_seq] The sequence "MT" not found [fai_fetch_seq] The sequence "GL000241.1" not found [fai_fetch_seq] The sequence "GL000214.1" not found [fai_fetch_seq] The sequence "GL000218.1" not found [fai_fetch_seq] The sequence "GL000220.1" not found [fai_fetch_seq] The sequence "GL000205.1" not found [fai_fetch_seq] The sequence "GL000219.1" not found [fai_fetch_seq] The sequence "GL000224.1" not found [fai_fetch_seq] The sequence "GL000195.1" not found [fai_fetch_seq] The sequence "GL000194.1" not found meanMI:0.003851 sdMI:0.036889 Unknown option: f Unknown option: l

Again thank you for having a look at my problem.

regards, Alex

doughertyr commented 8 years ago

Qing,

Each time I have a problem with giremi, it is because at least one line remains in the first output file that has N's in columns 6 through 9. The positions on these lines, though, don't correspond to N's in my fasta reference file. For example, in the output file I attached above, 968giremi_output.txt, there are N's in columns 6 though 9 on line 28773, which corresponds to the line execution is halted on. The site from that line, 16:21994960, however, does not correspond to an 'N' in hg19.fa. Therefore an 'N' in the fasta file is not causing this issue.

Thanks, Bobby

zhqingit commented 8 years ago

Hi Bobby,

Could you try to use samtools to call the nucleotide at position 16:21994960? It's strange that only this position can't be called by giremi.

Best, Qing

2016-03-07 14:15 GMT-07:00 doughertyr notifications@github.com:

Qing,

Each time I have a problem with giremi, it is because at least one line remains in the first output file that has N's in columns 6 through 9. The positions on these lines, though, don't correspond to N's in my fasta reference file. For example, in the output file I attached above, 968giremi_output.txt, there are N's in columns 6 though 9 on line 28773, which corresponds to the line execution is halted on. The site from that line, 16:21994960, however, does not correspond to an 'N' in hg19.fa. Therefore an 'N' in the fasta file is not causing this issue.

Thanks, Bobby

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-193453420.

doughertyr commented 8 years ago

Qing,

samtools faidx hg19.fa 16:21994960-21994960 > C

I know that neither of the surrounding nucleotides were N, but I don't have access to my reference genome right now to show you for certain what nucleotides they were.

Although this is the only case of this happening in this file, I had other files where this happened several times.

zhqingit commented 8 years ago

Hi Boby,

Actually, giremi calls the nucleotide in the genome by using the functions in samtools. Could you generate a new index file for your fasta file and try it again? If it still generates the N, please share the bam file to me, I will do some test.

Best, Qing

2016-03-08 7:05 GMT-07:00 doughertyr notifications@github.com:

Qing,

samtools faidx hg19.fa 16:21994960-21994960 > C

I know that neither of the surrounding nucleotides were N, but I don't have access to my reference genome right now to show you for certain what nucleotides they were.

Although this is the only case of this happening in this file, I had other files where this happened several times.

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-193796264.

zhqingit commented 8 years ago

Hi Alex,

I think the genomes GL0002... are not in your fasta file.

Best, Qing

2016-03-07 2:20 GMT-07:00 Balexander230 notifications@github.com:

Hi Qing,

I think I have removed them but am still getting an error. from my Rout file:

dat.pos <- cbind(dat.pos[,c("upstream_1base","downstream_1base")],abs(dat.pos[,"major_ratio"]-dat.pos[,"estimated_allelic_ratio"]),1) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 Calls: cbind -> cbind -> data.frame Execution halted

my new SNV_list file extra.txt https://github.com/zhqingit/giremi/files/161025/extra.txt the rest of my files remained the same. I have also found an error which may have been there before but went unnoticed by me the error goes like this: [mpileup] 1 samples in 1 input files [fai_fetch_seq] The sequence "MT" not found [fai_fetch_seq] The sequence "GL000241.1" not found [fai_fetch_seq] The sequence "GL000214.1" not found [fai_fetch_seq] The sequence "GL000218.1" not found [fai_fetch_seq] The sequence "GL000220.1" not found [fai_fetch_seq] The sequence "GL000205.1" not found [fai_fetch_seq] The sequence "GL000219.1" not found [fai_fetch_seq] The sequence "GL000224.1" not found [fai_fetch_seq] The sequence "GL000195.1" not found [fai_fetch_seq] The sequence "GL000194.1" not found meanMI:0.003851 sdMI:0.036889 Unknown option: f Unknown option: l

Again thank you for having a look at my problem.

regards, Alex

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-193175620.

doughertyr commented 8 years ago

Qing,

Below I shared my bam file with you. Since github can only support sharing files smaller than 10MB, and of a certain format, I couldn't share the entire file with you. The name of the original bam file in 968.bam.

The file I shared was made this way:

1) samtools view -b 968.bam 16:20000000-25000000 > 968.16.20000000-25000000.bam

2) zip -r bam.zip 968.16.20000000-2500000.bam

This is the line of the giremi output file with the N's, causing giremi.r to halt:

16 21994960 + 0 UQCRC2 N N N N 0 0 0.000000 -1 -1.000000 -1.000000e+00 0.500000 0 0 0.000000

Thanks, Bobby

bam.zip

zhqingit commented 8 years ago

Hi Boby,

If you use samtools mpileup to the bam file, you will find the chr16:21994960 is missed because of the zero coverage. Likewise, giremi can't identify this positioin.

Best, Qing

2016-03-10 11:42 GMT-07:00 doughertyr notifications@github.com:

Qing,

Below I shared my bam file with you. Since github can only support sharing files smaller than 10MB, and of a certain format, I couldn't share the entire file with you. The name of the original bam file in 968.bam.

The file I shared was made this way:

1) samtools view -b 968.bam 16:20000000-25000000 > 968.16.20000000-25000000.bam

2) zip -r bam.zip 968.16.20000000-2500000.bam

This is the line of the giremi output file with the N's, causing giremi.r to halt:

16 21994960 + 0 UQCRC2 N N N N 0 0 0.000000 -1 -1.000000 -1.000000e+00 0.500000 0 0 0.000000

Thanks, Bobby

bam.zip https://github.com/zhqingit/giremi/files/167761/bam.zip

— Reply to this email directly or view it on GitHub https://github.com/zhqingit/giremi/issues/4#issuecomment-194996242.