smithlabcode / dnmtools

Tools for analyzing DNA methylation data
https://dnmtools.readthedocs.io
GNU General Public License v3.0
25 stars 8 forks source link

summary hmr_count is non-zero but hmr output file is empty #213

Closed cb4github closed 4 months ago

cb4github commented 4 months ago

Describe the bug See subject.

To Reproduce Steps to reproduce the behavior (steps 1-8 for data preparation):

  1. wget --directory-prefix=data https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5652nnn/GSM5652231/suppl/GSM5652231%5FCerebellum%2DNeuron%2DZ000000TB.hg38.beta
  2. wgbstools init_genome --fasta_path $genome_reference_raw_fasta_file_path --force --no_default --debug --threads 19 hg38
  3. ln -s $wgbs_tools_hg38_references_path $wgbs_tools_default_references_path
  4. wgbstools beta2bed --outpath $methylation_counts_file_path --genome hg38 $beta_file_path
  5. wgbstools beta2bed --mean --outpath $methylation_mean_file_path --genome hg38 $beta_file_path
  6. paste \ <(gawk 'OFS="\t" {print $1,$2,"+", "CpG",$4}' $methylation_mean_file_path) \ <(cut -f 5 $methylation_counts_file_path) \

    $methylation_mean_plus_coverage_file_path

  7. head output/GSM5652231_Cerebellum-Neuron-Z000000TB.hg38.methylation_mean_plus_coverage.meth chr1 10468 + CpG 0.5 16 chr1 10470 + CpG 0.857 14 chr1 10483 + CpG 0.714 14 chr1 10488 + CpG 0.786 14 chr1 10492 + CpG 0.714 14 chr1 10496 + CpG 1 15 chr1 10524 + CpG 1 15 chr1 10541 + CpG 0.923 13 chr1 10562 + CpG 0.562 16 chr1 10570 + CpG 0.765 17
  8. wc output/GSM5652231_Cerebellum-Neuron-Z000000TB.hg38.methylation_mean_plus_coverage.meth 856 5136 20094 output/GSM5652231_Cerebellum-Neuron-Z000000TB.hg38.methylation_mean_plus_coverage.meth
  9. singularity exec $DNMTOOLS_IMAGE dnmtools hmr \ -p ${output_directory}/params.txt \ -o $hmr_file_path \ -verbose \ -summary ${output_directory}/summary.txt \ $methylation_mean_plus_coverage_file_path
  10. cat output/summary.txt hmr_count: 4 hmr_total_size: 1713 hmr_mean_size: 428.25
  11. ls -s output/GSM5652231_Cerebellum-Neuron-Z000000TB.hg38.hmr 0 output/GSM5652231_Cerebellum-Neuron-Z000000TB.hg38.hmr

Expected behavior Resulting hmr file non-empty.

Screenshots N/A

Desktop (please complete the following information):

module-whatis DNMTools is a set of tools for analyzing DNA methylation data from high-throughput sequencing experiments, especially whole genome bisulfite sequencing (WGBS), but also reduced representation bisulfite sequencing (RRBS). module load anaconda3/2023.07 singularity/3.9.0 prepend-path PATH /lustre/project/PI/apps/dnmtools/1.4.2/bin prepend-path PATH /lustre/project/PI/apps/dnmtools/1.4.2/bin/utils setenv DNMTOOLS_IMAGE /lustre/project/PI/singularity_images/dnmtools_latest_plus_bash.sif

%post apk update apk add bash apk add bash-doc apk add bash-completion

%test bash

Smartphone (please complete the following information):

Additional context Please let me know if you need any more information. Thanks.

Best, CB

andrewdavidsmith commented 4 months ago

@cb4github Thanks for this! I know the source of the problem and will prioritize fixing it.

andrewdavidsmith commented 4 months ago

Fixed with commit #214 and a release will happen soon to propagate this fix. Thank again @cb4github this really helps!

cb4github commented 4 months ago

@andrewdavidsmith Many thanks for fixing this. I'm currently unable to install from the resulting master without an updated tar ball or Dockerfile or working configure script. Thanks. Best, CB

cb4github commented 3 months ago

Update: I was able to reproduce issue #209 using gcc/8.5.0, which supports C++17 as mentioned in the README.

Yet, as I understand it, this appears to be inconsistent with the very-cheap optimization not being available until gcc 12.1.

Here, I believe, is a relevant quote from the RH developers' website here.

Only relatively recently in GCC 12.1 was the auto-vectorizer enabled when -O2 is specified.

Here is sample output from that attempted make.

make[4]: Entering directory `/lustre/project/<PI>/build/centos7/dnmtools/src/abismal'
depbase=`echo src/abismal.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
        g++ -std=c++17 -DHAVE_CONFIG_H -I.  -I ./src/smithlab_cpp -I ./src/bamxx  -fopenmp -O3  -MT src/abismal.o -MD -MP -MF $depbase.Tpo -c -o src/abismal.o src/abismal.cpp &&\
        mv -f $depbase.Tpo $depbase.Po
In file included from src/abismal.cpp:34:
src/AbismalAlign.hpp:215:50: error: unknown vectorizer cost model 'very-cheap'
 #pragma GCC optimize("vect-cost-model=very-cheap")
                                                  ^
src/AbismalAlign.hpp:215:50: note: valid arguments to '-fvect-cost-model=' are: cheap dynamic unlimited; did you mean 'cheap'?

Please advise. Thanks.

andrewdavidsmith commented 3 months ago

@cb4github Would you be able to move that to the #209 or open a new issue? I've got a reply, but want to keep things separate if that's ok.