marbl / merqury

k-mer based assembly evaluation
Other
272 stars 19 forks source link

merqury.completeness.stats empty #87

Closed mictadlo closed 9 months ago

mictadlo commented 1 year ago

Hi, I ran merqury but only one file is empty merqury.completeness.stats empty. What possibly could go wrong?

> ls -hlatr
total 6.6G
drwxrwx--- 6 lorencm default 4.0K Nov  8 14:10 ..
-rw------- 1 lorencm default  71K Nov  8 16:09 meryl.o3237620
-rw------- 1 lorencm default 128K Nov  8 16:19 meryl.o3237621
-rw------- 1 lorencm default  156 Nov  8 18:25 meryl.o3238252
-rw------- 1 lorencm default  156 Nov  8 18:25 meryl.o3238251
drwxrwx--- 4 lorencm default 4.0K Nov  8 18:56 fastp
-rw------- 1 lorencm default 128K Nov  8 20:28 meryl.o3238456
-rw------- 1 lorencm default 105K Nov  8 20:54 meryl.o3238455
-rw------- 1 lorencm default 4.4K Nov  9 08:25 merylUnionSum.o3238731
lrwxrwxrwx 1 lorencm default   53 Nov  9 09:19 1740D-43-06_S0_L001_R1_001.fastp.fastq.gz.meryl -> fastp/1740D-43-06_S0_L001_R1_001.fastp.fastq.gz.meryl
lrwxrwxrwx 1 lorencm default   53 Nov  9 09:19 1740D-43-06_S0_L001_R2_001.fastp.fastq.gz.meryl -> fastp/1740D-43-06_S0_L001_R2_001.fastp.fastq.gz.meryl
-rw------- 1 lorencm default 4.4K Nov  9 09:22 merylUnionSum.o3238760
-rw------- 1 lorencm default   77 Nov  9 10:37 STDIN.o3238882
-rw------- 1 lorencm default   84 Nov  9 10:37 STDIN.e3238882
-rw------- 1 lorencm default   77 Nov  9 10:54 STDIN.o3238894
-rw------- 1 lorencm default   84 Nov  9 10:54 STDIN.e3238894
drwxrwx--- 2 lorencm default  12K Nov  9 11:14 QLD.meryl
-rw------- 1 lorencm default  317 Nov  9 11:14 merylUnionSum.o3238893
lrwxrwxrwx 1 lorencm default   53 Nov  9 11:22 NbQld183.genome.fasta -> /work/waterhouse_team/NB/QLD183/NbQld183.genome.fasta
-rw------- 1 lorencm default   77 Nov  9 11:30 STDIN.o3238946
-rw------- 1 lorencm default   84 Nov  9 11:30 STDIN.e3238946
drwxrwx--- 2 lorencm default 4.0K Nov  9 11:30 logs
-rw-rw---- 1 lorencm default 952K Nov  9 11:30 QLD.hist
-rw-rw---- 1 lorencm default  169 Nov  9 11:30 QLD.hist.ploidy
-rw-rw---- 1 lorencm default    9 Nov  9 11:30 QLD.filt
drwxrwx--- 2 lorencm default  12K Nov  9 11:47 NbQld183.genome.meryl
-rw-rw---- 1 lorencm default 1.7M Nov  9 12:54 merqury.NbQld183.genome.spectra-cn.hist
-rw-rw---- 1 lorencm default   27 Nov  9 13:04 merqury.NbQld183.genome.only.hist
-rw-rw---- 1 lorencm default  89K Nov  9 13:04 merqury.NbQld183.genome.spectra-cn.ln.png
-rw-rw---- 1 lorencm default  89K Nov  9 13:04 merqury.NbQld183.genome.spectra-cn.fl.png
-rw-rw---- 1 lorencm default 104K Nov  9 13:07 merqury.NbQld183.genome.spectra-cn.st.png
-rw-rw---- 1 lorencm default   56 Nov  9 13:07 merqury.qv
-rw-rw---- 1 lorencm default  995 Nov  9 13:37 merqury.NbQld183.genome.qv
-rw-rw---- 1 lorencm default    0 Nov  9 13:37 merqury.completeness.stats
-rw-rw---- 1 lorencm default 4.5G Nov  9 14:08 NbQld183.genome_only.bed
-rw-rw---- 1 lorencm default 2.2G Nov  9 14:37 NbQld183.genome_only.wig
-rw-rw---- 1 lorencm default 2.8M Nov  9 14:49 merqury.spectra-asm.hist
-rw-rw---- 1 lorencm default   28 Nov  9 14:49 merqury.dist_only.hist
-rw-rw---- 1 lorencm default  95K Nov  9 14:49 merqury.spectra-asm.ln.png
-rw-rw---- 1 lorencm default  94K Nov  9 14:49 merqury.spectra-asm.fl.png
-rw-rw---- 1 lorencm default  98K Nov  9 14:52 merqury.spectra-asm.st.png
-rw------- 1 lorencm default  244 Nov  9 14:52 merqury.o3238945

Best wishes,

Michal

mictadlo commented 1 year ago

Hi, I used the command merqury.sh QLD.meryl NbQld183.genome.fasta merqury. Unfortunately, the log file shows few errors:

% grep "Can't interpret" merqury.spectra-cn.log 
Can't interpret 'boundary': not a meryl command, option, or recognized input file.
Can't interpret 'QLD.gtboundary.meryl': not a meryl command, option, or recognized input file.
Can't interpret 'QLD.gtboundary.meryl': not a meryl command, option, or recognized input file.
Can't interpret 'NbQld183.genome.solid.meryl': not a meryl command, option, or recognized input file.

What did I do wrong? merqury.spectra-cn.log

arangrhie commented 1 year ago

Please refer to https://github.com/marbl/merqury/issues/49#issuecomment-852394078 for fixing the 'module load R' part since you are running from a conda environment.

Could you paste what's in QLD.filt ? Seems like there are some garbage values in there? Try running Merqury in a clean directory, with QLD.meryl and NbQld183.genome.fasta symlinked.

Thanks, Arang

mictadlo commented 1 year ago

Hi Arang, Thank you for your response. Please find below the files which you ask me to check

> cat QLD.filt
boundary

and to edit:

> cat ./envs/merqurytest/share/merqury/util/util.sh
#!/usr/bin/env bash

function link()
{
        db_name=`basename $1`
        if [[ ! -e $db_name ]]; then
                ln -s $1
        fi
        echo $db_name
}

function check_module()
{
    module -v 1> /dev/null 2> /dev/null
    echo 1
}

In the meantime, I tried to use the latest master build and create a new conda package.

 wget -c https://artprodsu6weu.artifacts.visualstudio.com/A27bcc0f9-b22c-4418-83d8-514809b37fb2/25074ded-3133-43ca-af80-895ab87b53cb/_apis/artifact/cGlwZWxpbmVhcnRpZmFjdDovL2Jpb2NvbmRhL3Byb2plY3RJZC8yNTA3NGRlZC0zMTMzLTQzY2EtYWY4MC04OTVhYjg3YjUzY2IvYnVpbGRJZC8yNDQ5OC9hcnRpZmFjdE5hbWUvTGludXhBcnRpZmFjdHM1/content?format=zip
mv content\?format\=zip merqury-test.zip
unzip merqury-test.zip 
conda create -n merqurytest
conda activate merqurytest
conda install -c ./LinuxArtifacts/packages/noarch/merqury-1.3-hdfd78af_2.tar.bz2 merqury

Unfortunately, the completeness.stats file is still empty.

What did I miss?

pintomollo commented 1 year ago

Hi,

I had the same issue and took quite some time to debug it. The problem for me is in $MERQURY/build/filt.sh, lines 21 to 25.

More specifically, kmerHistToPloidyDepth.jar is emitting a warning message that troubles the filtering step. The content of my $db.hist.ploidy files is thus the following:

[0.001s][warning][os,container] Duplicate cpuset controllers detected. Picking /sys/fs/cgroup/cpuset, skipping /dev/cpuset. ploidy depth boundary 0 0 3 1 8 17

Which results in the same error as above: "Can't interpret 'boundary': not a meryl command, option, or recognized input file.", since on the second line "boundary" is now the last word, which is not a number. (Yes, I know a value of 3 is far from ideal)

My solution to fix the issue is to modify line 25 from:

filt=``sed -n 2p $db.hist.ploidy | awk '{print $NF}' `

to

filt=``grep -v warning $db.hist.ploidy | sed -n 2p | awk '{print $NF}' `

(sorry, too many ``` here due to some formatting issue)

Hope that helps.

Cheers, Simon

mictadlo commented 1 year ago

Thank you it works.

arangrhie commented 1 year ago
[0.001s][warning][os,container] Duplicate cpuset controllers detected. Picking /sys/fs/cgroup/cpuset, skipping /dev/cpuset.

This is interesting. Seems like a OS/platform specific log. I'll update the filt script to handle this...