yupenghe / methylpy

WGBS/NOMe-seq Data Processing & Differential Methylation Analysis
Apache License 2.0
135 stars 47 forks source link

DMRfind generates thousands of chunk.tsv files #94

Open MeganAdler opened 2 months ago

MeganAdler commented 2 months ago

Hi Yupeng,

I'm running into some issues with the results generated by DMRfind. I have EM-seq reads from two different organisms to use in my DMR analysis. I've run DMRfind (methylpy 1.4.7) on Arabidopsis with the simplified command below with success:

methylpy DMRfind \
    --allc-files /allc_Arabidopsis_1_merged.tsv.gz /allc_Arabidopsis_2_merged.tsv.gz
    --samples A1 A2 \
    --mc-type "CGN" \
    --chroms 1 2 3 4 5 \
    --num-procs 10 \
    --output-prefix /results/CGN_DMR_A1_A2_merged

Results:

However, running this similar command on another organism (with a scaffold genome) led to the generation of thousands of scaffold#_chunk#.tsv files in the results directory:

methylpy DMRfind \
    --allc-files /allc_Sample_1_merged.tsv.gz /allc_Sample_2_merged.tsv.gz
    --samples S1 S2 \
    --mc-type "CGN" \
    --num-procs 10 \
    --output-prefix /results/CGN_DMR_S1_S2_merged

Results:

It's interesting because there was no obvious error in the output file and some of the DMRs seem to have compiled in the top four files.

Thank you for your help, and please let me know if you need any more information for troubleshooting this issue.

yupenghe commented 2 months ago

Yes the behavior of generating thousands of chunk files is expected. The chunk files are expected to be removed at the end of the run but it does not seem to be the case. I don't know what went wrong. One potential explanation is that the program died without error message. If you run it again, do you still see the chunk files?