zhou-lab / biscuit

BISulfite-seq CUI Toolkit
Other
62 stars 24 forks source link

[fetch_refseq:40] Error, cannot retrieve reference chr1:0-0. #3

Open ttriche opened 8 years ago

ttriche commented 8 years ago

After marking dupes, the following

biscuit pileup -q 16 -r hg19.fa -i WGBS$SAMPLE.mkdups.hg19.bam -o WGBS$SAMPLE.mkdups.hg19.vcf

fails on a number of samples (4/6, not all though). Is there a straightforward way to debug this? Or just to grep out the offending read[s]? The runs that fail are all from the same flow cell IIRC. I don't see this behavior elsewhere and can't seem to find anything on Google. Eventually I'll look into the source... :-/

rbpisupati commented 8 years ago

Hi ttriche, Did you find any way to fix this error?

zwdzwd commented 8 years ago

Hi rahbz, are you seeing this? Could you elaborate on how to reproduce? I tried to reproduce this on my side but in vain. I must be missing something.

rbpisupati commented 8 years ago

Hi, I have run it with this command and it ended up giving me an error.

~/Softwares/biscuit-0.1.4.20160330/bin/biscuit pileup -r ~/whole_genome_TAIR10/TAIR10_all.fa -i 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam

fileformat=VCFv4.1

reference=/home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa

source=biscuitV0.1.3.20160324

contig=

contig=

contig=

contig=

contig=

contig=

contig=

program=<cmd=biscuit pileup -r /home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa -i 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam>

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified

[fetch_refseq:62] Error, cannot retrieve reference Chr1:1-0.

But I think I have figured out the error. If I change the reference genome modifying the header of fasta to just the Chromosome name instead of all the information I seemed to get no error. I dont know why that is the case. So @ttriche, I think you should modify your genome fasta headers to just the chromosome name.

Thanks, Rahul

ttriche commented 8 years ago

it disappeared once I updated biscuit and sorted the BAMs (verified sort order by indexing the result prior to bedgraph/vcf production)

--t

On Wed, May 25, 2016 at 6:12 AM, Wanding Zhou notifications@github.com wrote:

Hi rahbz, are you seeing this? Could you elaborate on how to reproduce? I tried to reproduce this on my side but in vain. I must be missing something.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/zwdzwd/biscuit/issues/3#issuecomment-221571395

ttriche commented 8 years ago

re: headers: I just used hg19 from UCSC, so that reason seems unlikely

--t

On Wed, May 25, 2016 at 6:33 AM, Rahul Bharadwaj notifications@github.com wrote:

Hi, I have run it with this command and it ended up giving me an error. ~/Softwares/biscuit-0.1.4.20160330/bin/biscuit pileup -r ~/whole_genome_TAIR10/TAIR10_all.fa -i 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam

fileformat=VCFv4.1

reference=/home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa

source=biscuitV0.1.3.20160324

contig=

contig=

contig=

contig=

contig=

contig=

contig=

program=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified [fetch_refseq:62] Error, cannot retrieve reference Chr1:1-0.

But I think I have figured out the error. If I change the reference genome modifying the header of fasta to just the Chromosome name instead of all the information I seemed to get no error. I dont know why that is the case. So @ttriche https://github.com/ttriche, I think you should modify your genome fasta headers to just the chromosome name.

Thanks, Rahul

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/zwdzwd/biscuit/issues/3#issuecomment-221577192

rbpisupati commented 8 years ago

Yeah even I thought the same, I have the same headers from Arabidopsis, tair10. Dont know really why that was the case.

jleluyer commented 7 years ago

Hello,

I have the exact same problem even after cloning the latest version of biscuit and sorting/indexing my bam files. Did someone figure out what would be the issue ?

Thanks

rbpisupati commented 7 years ago

Hello jleluyer, try to rename headers in your reference fasta file with just chromosome names. it should work then!

jleluyer commented 7 years ago

Hello,

They are just with scaffold names: i.e >sp_scaff00001. Is the way there are called a problem ? thanks

ZhenyuZ commented 7 years ago

I got similar error here "[fetch_refcache:94] Error, cannot retrieve reference" not chr1 though. It errors out in either HPV-mCG2 or HPV-mCG3 or both in the GDC reference genome https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files

Modification of fasta header to only keep the contig name did solve the problem

minghao4 commented 6 years ago

Hello, I'm getting a similar error: Command: biscuit pileup -g "chr6:1-10000" -o out.vcf GRCh38_p12_chr6.fa input_1_sorted.bam input_2_sorted.bam input_3_sorted.bam input_4_sorted.bam

Error: [refcache_fetch:85] Error, cannot retrieve reference chr6:0-0

The reference fasta I'm using only contains the GRCh38 assembled chromosome 6. I changed the header from ">CM000668.2 Homo sapiens chromosome 6, GRCh38 reference primary assembly" to just ">chr6" and it still does not resolve the problem.

EDIT: I've figured out the issue. I didn't index the reference fasta again after changing the header.

smcnew commented 5 years ago

Hi all: I'm having the same error. Command: biscuit pileup gamo_ref2.fa DC050biscuitoutput.bam

Error: [refcache_fetch:85] Error, cannot retrieve reference Chr1:0-0.

The resulting output vcf has correct headers but no data. I previously ran Biscuit pileup correctly using a consensus pileup reference created from my reads, I have also run it without problems using a zebra finch genome. However, I recently tried to re-run the analysis using a recently created genome for my species and have been unable to correctly run the pileup command.

Following the advice above in the thread I renamed all the chromosomes to have simpler names: the reference fasta now has the following chromosomes: `>Chr1

Chr1A Chr2 Chr3 Chr4 Chr4A Chr5 Chr6 Chr7 Chr8 Chr9 Chr10 Chr11 Chr12 Chr13 Chr14 Chr15 Chr17 Chr18 Chr19 Chr20 Chr21 Chr22 Chr23 Chr24 Chr25 Chr26 Chr27 Chr28 ChrZ ChrSS1 ChrSS2 ChrSS3 ChrSS4 ChrUS1 ChrUS2 ChrUS3 ChrUS4 ChrUS5`

I indexed the fasta and aligned my reads again and pileup still will not work.

Any suggestions would be greatly appreciated.

njspix commented 1 year ago

I have run into this issue quite a bit, oddly, it doesn't seem to be deterministic (sometimes it works, sometimes not). Say I have the following files in my working directory:

sample.bam
ref.fa

if I run biscuit pileup ref.fa sample.bam, I might get the following output:

[fai_load] build FASTA index.
...
[fai_load] build FASTA index.
[refcache_fetch:96] Error, cannot retrieve reference chr1:0-0.

However, if i run samtools faidx in the directory (creating ref.fa.fai), then run biscuit pileup as above, it seems to work without error.

Based on this I'm guessing that the built-in faidx generation in biscuit is buggy (perhaps using an out-of-date htslib function?). Suggest generating a fresh faidx using samtools before running biscuit pileup.

zwdzwd commented 1 year ago

I don't recall biscuit generates faidx in any of its function. Yeah use samtools.

njspix commented 1 year ago

Perhaps we should document that a faidx is necessary to run the command, even though it's not specified in the command...

somnya commented 7 months ago

Thank you @njspix ! I followed your suggestion and it worked for me: Using "samtools faidx" instead of "biscuit index"

I hope this modification can be documented.