heathsc / gemBS-rs

A re-write of the gemBS pipeline framework in Rust
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

`extract` stuck re-writing output files #6

Open cbreenmachine opened 3 years ago

cbreenmachine commented 3 years ago

Hi Simon,

Thanks again for your help earlier with the non-stranded issue. I am having one (hopefully easy-to-fix) issue with methylation extraction.

Running extract on a few human WGBS datasets does not finish. For some context, mapping and calling on five WGBS can take 2-3 days on my current system without a problem. Running extract will go for a week or more, and it seems that the command is working well, but getting stuck in a loop where it will write an output file, then do an asset check, decide the file needs to be created and then start the whole process over. Because of this, keeping track of the extract output folder size every five minutes results in something like 22 G --> 28 G --> 34 G --> 41 G --> 20 G --> ...

Here's what I have tried:

  1. Used a different server. Issue seems to be the same on Ubuntu as well as a Red Hat Linux Distro
  2. Tried one human WGBS instead of five "at once". Issue still persists.
  3. Restricted the output to just ENCODE style, just gemBS style outputs. It does not seem to make a difference.
  4. Tried increasing the default RAM available to 100G. Does not seem to change anything.

I've interrupted the process before and looked at the files before and they seem reasonable and match my collaborator's outputs. Of course because I interrupted the process the EOF and most of the data has not been written.

Is there any more information I can provide? Or is this a case of user-error?

Thanks very much!

Best, Coleman

Here is console output in default mode: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf

This pattern will continue over and over until interrupted. And then in debug mode (run on a different batch of five samples):

DEBUG - Asset check: "sample009_115_cpg.bed.gz" "./extract/sample009_115_cpg.bed.gz" Absent DEBUG - Asset check: "sample009_115_cpg.bed.gz.md5" "./extract/sample009_115_cpg.bed.gz.md5" Absent DEBUG - Asset check: "sample009_115_cpg.bb" "./extract/sample009_115_cpg.bb" Absent DEBUG - Asset check: "sample009_115_cpg.bb.md5" "./extract/sample009_115_cpg.bb.md5" Absent DEBUG - Asset check: "sample009_115_chg.bed.gz" "./extract/sample009_115_chg.bed.gz" Absent DEBUG - Asset check: "sample009_115_chg.bed.gz.md5" "./extract/sample009_115_chg.bed.gz.md5" Absent DEBUG - Asset check: "sample009_115_chg.bb" "./extract/sample009_115_chg.bb" Absent DEBUG - Asset check: "sample009_115_chg.bb.md5" "./extract/sample009_115_chg.bb.md5" Absent DEBUG - Asset check: "sample009_115_chh.bed.gz" "./extract/sample009_115_chh.bed.gz" Absent DEBUG - Asset check: "sample009_115_chh.bed.gz.md5" "./extract/sample009_115_chh.bed.gz.md5" Absent DEBUG - Asset check: "sample009_115_chh.bb" "./extract/sample009_115_chh.bb" Absent DEBUG - Asset check: "sample009_115_chh.bb.md5" "./extract/sample009_115_chh.bb.md5" Absent DEBUG - Asset check: "sample009115.bw" "./extract/sample009115.bw" Absent DEBUG - Asset check: "sample009115.bw.md5" "./extract/sample009115.bw.md5" Absent DEBUG - Asset check: "report.tex" "./report/GemBS_QC_Report.tex" Present DEBUG - Asset check: "report.html" "./report/GemBS_QC_Report.html" Present DEBUG - Avail slots: 45.0001, avail memory: 11.6 GB DEBUG - No execution slots

atggcagatgagtatgcattaaagtag commented 6 months ago

@cbreenmachine I'm encountering a similar problem. Did you manage to find a solution or workaround?

cbreenmachine commented 6 months ago

Unfortunately no. Went back and forth with sys admins for weeks and they couldn't figure it out. Ended up abandoning this and using the gem mapper and bs_call separately. While gemBS is fast when it's working, it does not seem to be actively maintained and buggy. I ended up losing a lot of time working around these types of issues. I'd recommend using bismark instead.

heathsc commented 6 months ago

Bismark and gemBS are not at all doing the same analyses. Bismark makes no account of variants, sequencing errors or sequencing inefficiencies, does not do variant calling etc. etc.

The problem is that I cannot debug a problem that I cannot reproduce. If I can be provided with a dataset and an environment that reliably causes the problem then I can try to track it down. An error that only occurs in a few installations is impossible for me to diagnose. Not being able to track down the problems running in a particular environment that I do not have access to does not mean that gemBS is no longer being maintained!

Yours, Simon

On Fri, May 3, 2024 at 5:32 PM Coleman Breen @.***> wrote:

Unfortunately no. Went back and forth with sys admins for weeks and they couldn't figure it out. Ended up abandoning this and using the gem mapper https://github.com/smarco/gem3-mapper and bs_call separately. While gemBS is fast when it's working, it does not seem to be actively maintained and buggy. I ended up losing a lot of time working around these types of issues. I'd recommend using bismark https://www.bioinformatics.babraham.ac.uk/projects/bismark/ instead.

— Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS-rs/issues/6#issuecomment-2093250311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4653ALGUYEMRXPZ4OSTLZAOUX5AVCNFSM43XNRA7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZGMZDKMBTGEYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>