walaj / svaba

Structural variation and indel detection by local assembly
GNU General Public License v3.0
230 stars 44 forks source link

ERROR: Unable to open index file: ucsc.hg19.fa, yet it's in the current directory. #107

Open jamesdalg opened 2 years ago

jamesdalg commented 2 years ago

bash-4.2$ /data/CCRBioinfo/dalgleishjl/sv_mapping/svaba/bin/svaba run -t PAMHYN_Tumor.realigned.md.bam -n PAMHYN_Normal.realigned.md.bam -a PAMHYN_discovery_chr22 -k chr22 -G ucsc.hg19.fa -p 2

--- Running svaba SV and indel detection on 2 threads ---- --- (inspect *.log for real-time progress updates) ---

!!!! WARNING. Multiple readlengths mixed: 676--81 max readlen 101 !!!! WARNING. Multiple readlengths mixed: 27125--101 max readlen 101 !!!! WARNING. Multiple readlengths mixed: 26967--101 max readlen 101 !!!! WARNING. Multiple readlengths mixed: 1016--101 max readlen 101 !!!! WARNING. Multiple readlengths mixed: 555--81 max readlen 101 !!!! WARNING. Multiple readlengths mixed: 556--81 max readlen 101 !!!! WARNING. Multiple readlengths mixed: 1017--101 max readlen 101 !!!! WARNING. Multiple readlengths mixed: 526--81 max readlen 101 [E::bwa_idx_load_from_disk] fail to locate the index files ERROR: Unable to open index file: ucsc.hg19.fa bash-4.2$ ls -lrt total 11850119 -rw-r--r-- 1 dalgleishjl CCRBioinfo 3443961889 Mar 10 2020 hg19.p13.plusMT.no_alt_analysis_set.bwa_index.tar.gz lrwxrwxrwx 1 dalgleishjl CCRBioinfo 85 Oct 15 12:55 PAMHYN_Normal.realigned.md.bam.bai -> /data/CCRBioinfo/projects/TargetOsteoDiscovery/bam/PAMHYN_Normal.realigned.md.bam.bai lrwxrwxrwx 1 dalgleishjl CCRBioinfo 46 Oct 15 12:55 hg19.fa -> /data/CCRBioinfo/dalgleishjl/reference/hg19.fa lrwxrwxrwx 1 dalgleishjl CCRBioinfo 81 Oct 15 12:56 PAMHYN_Normal.realigned.md.bam -> /data/CCRBioinfo/projects/TargetOsteoDiscovery/bam/PAMHYN_Normal.realigned.md.bam lrwxrwxrwx 1 dalgleishjl CCRBioinfo 84 Oct 15 12:56 PAMHYN_Tumor.realigned.md.bam.bai -> /data/CCRBioinfo/projects/TargetOsteoDiscovery/bam/PAMHYN_Tumor.realigned.md.bam.bai lrwxrwxrwx 1 dalgleishjl CCRBioinfo 80 Oct 15 12:56 PAMHYN_Tumor.realigned.md.bam -> /data/CCRBioinfo/projects/TargetOsteoDiscovery/bam/PAMHYN_Tumor.realigned.md.bam drwxr-xr-x 2 dalgleishjl CCRBioinfo 4096 Oct 15 13:16 hg19.p13.plusMT.no_alt_analysis_set -rw-r--r-- 1 dalgleishjl CCRBioinfo 3199905909 Oct 15 15:44 ucsc.hg19.fa -rw-r----- 1 dalgleishjl CCRBioinfo 8595 Oct 15 15:45 ucsc.hg19.amb -rw-r----- 1 dalgleishjl CCRBioinfo 4035 Oct 15 15:45 ucsc.hg19.ann -rw-r----- 1 dalgleishjl CCRBioinfo 3137161344 Oct 15 15:45 ucsc.hg19.bwt -rw-r----- 1 dalgleishjl CCRBioinfo 784290318 Oct 15 15:45 ucsc.hg19.pac -rw-r----- 1 dalgleishjl CCRBioinfo 1568580688 Oct 15 15:45 ucsc.hg19.sa -rw-r--r-- 1 dalgleishjl CCRBioinfo 2287 Oct 15 15:45 PAMHYN_discovery_chr22.log bash-4.2$ cat PAMHYN_discovery_chr22.log ***** PARAMS **** DBSNP Database file: Max cov to assemble: 100 Error correction mode: f Subsample-rate for correction learning: 0.500000 ErrorRate: EXACT (0) Num assembly rounds: 3 Num reads to sample: 2000000 Discordant read extract SD cutoff: 3.92 Discordant cluster std-dev cutoff: 3.92 Minimum number of reads for mate lookup 3 LOD cutoff (non-REF): 8 LOD cutoff (non-REF, at DBSNP): 6 LOD somatic cutoff: 6 LOD somatic cutoff (at DBSNP): 10 BWA-MEM params: Gap open penalty: 32 Gap extension penalty: 1 Mismatch penalty: 18 Aligment bandwidth: 1000 Z-dropoff: 100 Clip 3 penalty: 5 Clip 5 penalty: 5 Reseed trigger: 1.5 Sequence match score: 2


BAM PARAMS FOR: n001--PAMHYN_Normal.realigned.md.bam @@@ READ GROUP 1017 Insert Size: 172.011(61.9629) [0.025%,97.5%] [102,386], Mean Coverage: 13 Read Length: 101 Max MapQ:70 @@@ READ GROUP 526 Insert Size: 188.433(67.841) [0.025%,97.5%] [88,396], Mean Coverage: 3 Read Length: 81 Max MapQ:70 min_dscrd_size_for_variant 454 BAM PARAMS FOR: t000--PAMHYN_Tumor.realigned.md.bam @@@ READ GROUP 676 Insert Size: 146.366(45.43) [0.025%,97.5%] [84,291], Mean Coverage: 5 Read Length: 81 Max MapQ:70 @@@ READ GROUP 27125 Insert Size: 178.404(66.3368) [0.025%,97.5%] [102,403], Mean Coverage: 18 Read Length: 101 Max MapQ:70 @@@ READ GROUP 26967 Insert Size: 177.116(65.3228) [0.025%,97.5%] [102,396], Mean Coverage: 17 Read Length: 101 Max MapQ:70 @@@ READ GROUP 1016 Insert Size: 178.804(66.198) [0.025%,97.5%] [103,400], Mean Coverage: 15 Read Length: 101 Max MapQ:70 @@@ READ GROUP 555 Insert Size: 148.107(45.9535) [0.025%,97.5%] [84,291], Mean Coverage: 7 Read Length: 81 Max MapQ:70 @@@ READ GROUP 556 Insert Size: 147.096(45.9974) [0.025%,97.5%] [83,290], Mean Coverage: 7 Read Length: 81 Max MapQ:70 min_dscrd_size_for_variant 454 ...min discordant-only variant size 454 ...found read length of 101. Min Overlap is 60 ...max read MAPQ detected: 70

...calculated seed size for error rate of 0.000000 and read length 101 is 60 ...loading the human reference sequence for BWA bash-4.2$

jamesdalg commented 2 years ago

Any idea on this? I've read the previous issues that were similar, but I do have the indexes and the fa file at this point. Not sure why this error would come up. If anyone thinks of a way to get around this problem, let me know.

walaj commented 2 years ago

You have to have the genome fasta indexed with both bwa and samtools faidx. I’d run or rerun those on the genomes and try again.

On Oct 15, 2021, at 3:52 PM, jamesdalg @.***> wrote:

 Any idea on this? I've read the previous issues that were similar, but I do have the indexes and the fa file at this point. Not sure why this error would come up. If anyone thinks of a way to get around this problem, let me know.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.