Closed scthree closed 9 months ago
Hello,
There was a serious issue in meryl v1.4 release, making meryl-lookup
crash. I wonder this was also affected by this issue.
If possible, could you obtain meryl v1.4.1 release, and re-run merqury?
If you'd like to re-run, you could you remove .sort.bed and .hap.wig, then re-run $MERQURY/trio/phase_block.sh
. This has to be done for each assembly (so twice). Next, run $MERQURY/trio/block_n_stats.sh
afterwards (also, twice). Since you are re-running it manually, you could also provide a genome size that could go with block_n_stats.sh to generate NG plots instead of N plots.
Best, Arang
Hello, I think we have similar errors, but the difference is that I am running on version 1.4.1. I encountered similar errors during installation using both git clone and meryl-1.4.1.Linux-amd64.tar.xz. In the detailed documentation below, I executed the same commands by providing the reference sequence and parent-offspring databases, along with the two assembled haploid fa files.
Both the quality values (qv) and the generation of plots seem to be working fine, but I am unable to obtain the 100_20000.phased_block.stats and switch error. I cannot observe the completeness of the assembly. Based on the error messages, it seems to suggest that bedtools, samtools, and R are not present. However, I have confirmed that these tools exist in the current conda environment through the executed commands:
samtools Program: samtools (Tools for alignments in the SAM format) Version: 1.18 (using htslib 1.17) R --version R version 4.1.1 (2021-08-10) -- "Kick Things" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-conda-linux-gnu (64-bit) bedtools bedtools is a powerful toolset for genome arithmetic. Version: v2.31.1
Haplotype dbs provided. Running Merqury in trio mode...
hap1: chr21-db_M.meryl hap2: chr21-db_F.meryl asm1: hic_pred_hap1.fa asm2: hic_pred_hap2.fa out : hic
Get spectra-cn plots and QV stats
Get blob plots
Get haplotype specfic spectra-cn plots
Get phase blocks
Get block N plots ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'bedtools' ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'samtools' ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'R'
[E::fai_build3_core] Failed to open the file hic_pred_hap1.fa.fasta [faidx] Could not build fai index hic_pred_hap1.fa.fasta.fai
# Found hic_pred_hap1.fa.gaps.bed
awk: fatal: cannot open file `hic_pred_hap1.fa.fasta.fai' for reading (No such file or directory)
[E::fai_build3_core] Failed to open the file hic_pred_hap2.fa.fasta [faidx] Could not build fai index hic_pred_hap2.fa.fasta.fai
# Found hic_pred_hap2.fa.gaps.bed
awk: fatal: cannot open file `hic_pred_hap2.fa.fasta.fai' for reading (No such file or directory)
Result saved as hic.hic_pred_hap1.fa.100_20000.phased_block.sizes
Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_block_N.R -b hic.hic_pred_hap1.fa.100_20000.phased_block.sizes -c hic.hic_pred_hap1.fa.contig.sizes -o hic.hic_pred_hap1.fa Loading required package: argparse Warning message: package ‘argparse’ was built under R version 4.1.3 Loading required package: ggplot2 Warning message: package ‘ggplot2’ was built under R version 4.1.3 Loading required package: scales Warning message: package ‘scales’ was built under R version 4.1.3 Error in read.table(dat, header = F) : no lines available in input Calls: block_n -> attach_n -> read.table Execution halted
Result saved as hic.hic_pred_hap2.fa.100_20000.phased_block.sizes
Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_block_N.R -b hic.hic_pred_hap2.fa.100_20000.phased_block.sizes -c hic.hic_pred_hap2.fa.contig.sizes -o hic.hic_pred_hap2.fa Loading required package: argparse Warning message: package ‘argparse’ was built under R version 4.1.3 Loading required package: ggplot2 Warning message: package ‘ggplot2’ was built under R version 4.1.3 Loading required package: scales Warning message: package ‘scales’ was built under R version 4.1.3 Error in read.table(dat, header = F) : no lines available in input Calls: block_n -> attach_n -> read.table Execution halted
100_20000.phased_block: nothing 100_20000.switch.bed: nothing
I sincerely hope that you are familiar with this issue and can assist me. Thank you very much for your help!
Hello @qq923781272 ,
dafc7e31955ed7a1b6e4d89cfee93e1881759c6a will fix your crash. Also I see you don't need to load the modules. You could uncomment this line to avoid the ModuleCmd_Load ERRORs.
I'd suggest to clone the latest Merqury git from the master branch (nothing to install), and re run Merqury. Hope this works for you!
Best, Arang
I apologize for bothering you again. I have checked whether my Merqury version matches the branch correction you sent me. I am certain it is the latest version, but I am still encountering similar issues. Just to be safe, I cloned a fresh copy of the code, but the same error persists.
What's even more puzzling is that I successfully generated the phased_block and 100_20000.switches.txt in a previous run yesterday. However, today, I have been unable to reproduce the success under any circumstances.
I followed your instructions in the previous response to remove and execute $MERQURY/trio/phase_block.sh and $MERQURY/trio/block_n_stats.sh twice each. However, during this process, errors occurred, preventing me from obtaining results for switch and phased_block.
Do you have any insights into why this might be happening? The command I used is $MERQURY/merqury.sh read-db.meryl mat.meryl pat.meryl asm1.fasta out_prefix.
Here is the error message I encountered: Haplotype dbs provided. Running Merqury in trio mode...
hap1: chr21-db_M.meryl hap2: chr21-db_F.meryl asm1: hic_pred_hap1.fa asm2: hic_pred_hap2.fa out : hic
Get spectra-cn plots and QV stats
Get blob plots
Get haplotype specfic spectra-cn plots
Get phase blocks
Get block N plots ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'bedtools' ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'samtools' ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'R'
[E::fai_build3_core] Failed to open the file hic_pred_hap1.fa.fasta [faidx] Could not build fai index hic_pred_hap1.fa.fasta.fai
# Found hic_pred_hap1.fa.gaps.bed
awk: fatal: cannot open file `hic_pred_hap1.fa.fasta.fai' for reading (No such file or directory)
[E::fai_build3_core] Failed to open the file hic_pred_hap2.fa.fasta [faidx] Could not build fai index hic_pred_hap2.fa.fasta.fai
# Found hic_pred_hap2.fa.gaps.bed
awk: fatal: cannot open file `hic_pred_hap2.fa.fasta.fai' for reading (No such file or directory)
Result saved as hic.hic_pred_hap1.fa.100_20000.phased_block.sizes
Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_block_N.R -b hic.hic_pred_hap1.fa.100_20000.phased_block.sizes -c hic.hic_pred_hap1.fa.contig.sizes -o hic.hic_pred_hap1.fa Loading required package: argparse Loading required package: ggplot2 Loading required package: scales Error in read.table(dat, header = F) : no lines available in input Calls: block_n -> attach_n -> read.table Execution halted
Result saved as hic.hic_pred_hap2.fa.100_20000.phased_block.sizes
Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_block_N.R -b hic.hic_pred_hap2.fa.100_20000.phased_block.sizes -c hic.hic_pred_hap2.fa.contig.sizes -o hic.hic_pred_hap2.fa Loading required package: argparse Loading required package: ggplot2 Loading required package: scales Error in read.table(dat, header = F) : no lines available in input Calls: block_n -> attach_n -> read.table Execution halted
Found 1 command tree.
PROCESSING TREE #1 using 128 threads. opLessThan chr21-db_M.meryl print to (stdout) Detected k-mer size: 21 [E::fai_build3_core] Failed to open the file hic_pred_hap1.fa.fasta [faidx] Could not build fai index hic_pred_hap1.fa.fasta.fai
Generate haplotype marker sites bed
-- chr21-db_M
usage: meryl-lookup
Compare kmers in input sequences against kmers in input meryl databases.
Input sequences (-sequence) can be FASTA or FASTQ, uncompressed, or compressed with gzip, xz, or bzip2.
Report types:
-bed: Generate a BED format file showing the location of kmers in any input database on each sequence in 'input1.fasta'.
-wig-count: Generate a WIGGLE format file showing the multiplicity of the kmer starting at each position in the sequence, if it exists in an input kmer database.
-wig-depth: Generate a WIGGLE format file showing the number of kmers in any input database that cover each position in the sequence.
-existence: Generate a tab-delimited line for each input sequence with the number of kmers in the sequence, in the database and common to both.
-include: -exclude: Copy sequences from 'input1.fasta' (and 'input2.fasta') to the corresponding output file if the sequence has at least one kmer present (include) or no kmers present (exclude) in 'input1.meryl'.
Run meryl-lookup <report-type> -help
for details on each method.
Unknown option '-dump'. No report-type (-bed, -wig-count, -wig-depth, -existence, -include, -exclude) supplied. Using system JDK. INFO [Nov 29,2023 18:47] [Globals] Development mode is enabled SEVERE [Nov 29,2023 18:47] [IgvTools] Genome definition file not found for: hic_pred_hap1.fa.fasta.fai SEVERE [Nov 29,2023 18:47] [IgvTools] org.broad.igv.tools.PreprocessingException: Genome definition file not found for: hic_pred_hap1.fa.fasta.fai at org.igv/org.broad.igv.tools.IgvTools.loadGenome(IgvTools.java:1156) at org.igv/org.broad.igv.tools.IgvTools.doCount(IgvTools.java:826) at org.igv/org.broad.igv.tools.IgvTools.run(IgvTools.java:371) at org.igv/org.broad.igv.tools.IgvTools.main(IgvTools.java:254)
-- chr21-db_F
usage: meryl-lookup
Compare kmers in input sequences against kmers in input meryl databases.
Input sequences (-sequence) can be FASTA or FASTQ, uncompressed, or compressed with gzip, xz, or bzip2.
Report types:
-bed: Generate a BED format file showing the location of kmers in any input database on each sequence in 'input1.fasta'.
-wig-count: Generate a WIGGLE format file showing the multiplicity of the kmer starting at each position in the sequence, if it exists in an input kmer database.
-wig-depth: Generate a WIGGLE format file showing the number of kmers in any input database that cover each position in the sequence.
-existence: Generate a tab-delimited line for each input sequence with the number of kmers in the sequence, in the database and common to both.
-include: -exclude: Copy sequences from 'input1.fasta' (and 'input2.fasta') to the corresponding output file if the sequence has at least one kmer present (include) or no kmers present (exclude) in 'input1.meryl'.
Run meryl-lookup <report-type> -help
for details on each method.
Unknown option '-dump'. No report-type (-bed, -wig-count, -wig-depth, -existence, -include, -exclude) supplied. Using system JDK. INFO [Nov 29,2023 18:47] [Globals] Development mode is enabled SEVERE [Nov 29,2023 18:47] [IgvTools] Genome definition file not found for: hic_pred_hap1.fa.fasta.fai SEVERE [Nov 29,2023 18:47] [IgvTools] org.broad.igv.tools.PreprocessingException: Genome definition file not found for: hic_pred_hap1.fa.fasta.fai at org.igv/org.broad.igv.tools.IgvTools.loadGenome(IgvTools.java:1156) at org.igv/org.broad.igv.tools.IgvTools.doCount(IgvTools.java:826) at org.igv/org.broad.igv.tools.IgvTools.run(IgvTools.java:371) at org.igv/org.broad.igv.tools.IgvTools.main(IgvTools.java:254)
Sort hic.bed
/home/lyc/.conda/envs/porec/share/merqury/trio/switch_error.sh hic.sort.bed hic 100 20000
java -jar -Xmx1g /home/lyc/.conda/envs/porec/share/merqury/trio/bedMerToPhaseBlock.jar hic.sort.bed hic.100_20000 100 20000
Processing file hic.sort.bed Found and as haplotypes. Running time : 0 h 0 m 0 sec
awk: cmd. line:1: fatal: division by zero attempted
java -jar -Xmx1g /home/lyc/.conda/envs/porec/share/merqury/eval/bedCalcN50.jar hic.100_20000.phased_block.bed | tail -n1 | awk -v out=hic.100_20000 -v swi="" '{print out"\t"$0"\tswi}' - >> hic.100_20000.phased_block.stats Processing file hic.100_20000.phased_block.bed Num. blocks: 0 Genome covered bases: 0 Exception in thread "main" java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) at java.base/java.util.Objects.checkIndex(Objects.java:359) at java.base/java.util.ArrayList.get(ArrayList.java:427) at javax.arang.bed.CalcN50.hooker(CalcN50.java:41) at javax.arang.IO.Rwrapper.go(Rwrapper.java:14) at javax.arang.bed.CalcN50.main(CalcN50.java:66)
Count and hap-mers per block to hic.100_20000.phased_block.counts ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'R'
Rscript /home/lyc/.conda/envs/porec/share/merqury/plot/plot_blob.R -f hic.100_20000.phased_block.counts -o hic.100_20000.phased_block.blob
Loading required package: argparse
Loading required package: ggplot2
Loading required package: scales
Error in [.data.frame
(dat, , 4) : undefined columns selected
Calls: blob_plot -> [ -> [.data.frame
In addition: Warning message:
In max(dat[, 3]) : no non-missing arguments to max; returning -Inf
Execution halted
Thank you once again for your response, and I apologize for any inconvenience caused!
Dear Developers,
Thank you for this wonderful tool and your contributions in leading the community!
Forgive me if I'm asking a stupid question. I recently used merqury to eval several assemblies generated with hifiasm (HiFi + ONT, then either phased with Trio-binning OR HiC). Was able to run merqury and generate all the asm stats/plots.
Then just made a Verkko assembly (phased with Trio-binning) and ran merqury on the diploid assembly (assembly.fasta) and hap1+hap2 (assembly.haplotype1.fasta, assembly.haplotype2.fasta). No issues with qv, completeness, and generating those plots/stats...but get an Rscript error when it's trying to generate the phased_block.stats - please see error log below.
Could you please let me know what I'm doing wrong? I know you're busy and appreciate your time!
Sincerely, Steve
Haplotype dbs provided. Running Merqury in trio mode...
hap1: mat_uncompress.k21.hapmer.meryl hap2: pat_uncompress.k21.hapmer.meryl asm1: assembly.fasta out : dip
Get spectra-cn plots and QV stats Get blob plots Get haplotype specfic spectra-cn plots Get phase blocks Get block N plots
Generate assembly.fasta.fai
# Found assembly.gaps.bed
Found 246. Generating stats for both scaffolds and contigs.
Convert dip.assembly.100_20000.phased_block.bed to sizes
Result saved as dip.assembly.100_20000.phased_block.sizes
Plot dip.assembly.100_20000.phased_block.bed
Rscript /Programs/merqury-1.3/plot/plot_block_N.R -b dip.assembly.100_20000.phased_block.sizes -c dip.assembly.contig.sizes -s dip.assembly.scaff.sizes -o dip.assembly Loading required package: argparse Loading required package: ggplot2 Loading required package: scales Error in read.table(dat, header = F) : no lines available in input Calls: block_n -> attach_n -> read.table Execution halted
No block2 found. Done!
read: dtr_uncompress.k21.meryl
Haplotype dbs provided. Running Merqury in trio mode...
hap1: mat_uncompress.k21.hapmer.meryl hap2: pat_uncompress.k21.hapmer.meryl asm1: assembly.haplotype1.fasta asm2: assembly.haplotype2.fasta out : hap
Get spectra-cn plots and QV stats Get blob plots Get haplotype specfic spectra-cn plots Get phase blocks Get block N plots
Generate assembly.haplotype1.fasta.fai
# Found assembly.haplotype1.gaps.bed
Found 85. Generating stats for both scaffolds and contigs.
Generate assembly.haplotype2.fasta.fai
# Found assembly.haplotype2.gaps.bed
Found 161. Generating stats for both scaffolds and contigs.
Convert hap.assembly.haplotype1.100_20000.phased_block.bed to sizes
Result saved as hap.assembly.haplotype1.100_20000.phased_block.sizes
Plot hap.assembly.haplotype1.100_20000.phased_block.bed
Rscript /Programs/merqury-1.3/plot/plot_block_N.R -b hap.assembly.haplotype1.100_20000.phased_block.sizes -c hap.assembly.haplotype1.contig.sizes -s hap.assembly.haplotype1.scaff.sizes -o hap.assembly.haplotype1 Loading required package: argparse Loading required package: ggplot2 Loading required package: scales Error in read.table(dat, header = F) : no lines available in input Calls: block_n -> attach_n -> read.table Execution halted
Convert hap.assembly.haplotype2.100_20000.phased_block.bed to sizes
Result saved as hap.assembly.haplotype2.100_20000.phased_block.sizes
Plot hap.assembly.haplotype2.100_20000.phased_block.bed
Rscript /Programs/merqury-1.3/plot/plot_block_N.R -b hap.assembly.haplotype2.100_20000.phased_block.sizes -c hap.assembly.haplotype2.contig.sizes -s hap.assembly.haplotype2.scaff.sizes -o hap.assembly.haplotype2 Loading required package: argparse Loading required package: ggplot2 Loading required package: scales Error in read.table(dat, header = F) : no lines available in input Calls: block_n -> attach_n -> read.table Execution halted