Open cbreenmachine opened 3 years ago
@cbreenmachine I'm encountering a similar problem. Did you manage to find a solution or workaround?
Unfortunately no. Went back and forth with sys admins for weeks and they couldn't figure it out. Ended up abandoning this and using the gem mapper and bs_call separately. While gemBS is fast when it's working, it does not seem to be actively maintained and buggy. I ended up losing a lot of time working around these types of issues. I'd recommend using bismark instead.
Bismark and gemBS are not at all doing the same analyses. Bismark makes no account of variants, sequencing errors or sequencing inefficiencies, does not do variant calling etc. etc.
The problem is that I cannot debug a problem that I cannot reproduce. If I can be provided with a dataset and an environment that reliably causes the problem then I can try to track it down. An error that only occurs in a few installations is impossible for me to diagnose. Not being able to track down the problems running in a particular environment that I do not have access to does not mean that gemBS is no longer being maintained!
Yours, Simon
On Fri, May 3, 2024 at 5:32 PM Coleman Breen @.***> wrote:
Unfortunately no. Went back and forth with sys admins for weeks and they couldn't figure it out. Ended up abandoning this and using the gem mapper https://github.com/smarco/gem3-mapper and bs_call separately. While gemBS is fast when it's working, it does not seem to be actively maintained and buggy. I ended up losing a lot of time working around these types of issues. I'd recommend using bismark https://www.bioinformatics.babraham.ac.uk/projects/bismark/ instead.
— Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS-rs/issues/6#issuecomment-2093250311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4653ALGUYEMRXPZ4OSTLZAOUX5AVCNFSM43XNRA7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZGMZDKMBTGEYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Simon,
Thanks again for your help earlier with the non-stranded issue. I am having one (hopefully easy-to-fix) issue with methylation extraction.
Running
extract
on a few human WGBS datasets does not finish. For some context, mapping and calling on five WGBS can take 2-3 days on my current system without a problem. Runningextract
will go for a week or more, and it seems that the command is working well, but getting stuck in a loop where it will write an output file, then do an asset check, decide the file needs to be created and then start the whole process over. Because of this, keeping track of the extract output folder size every five minutes results in something like 22 G --> 28 G --> 34 G --> 41 G --> 20 G --> ...Here's what I have tried:
I've interrupted the process before and looked at the files before and they seem reasonable and match my collaborator's outputs. Of course because I interrupted the process the EOF and most of the data has not been written.
Is there any more information I can provide? Or is this a case of user-error?
Thanks very much!
Best, Coleman
Here is console output in default mode:
/usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample004_110_mextr_ctgs.bed --cpgfile ./extract/sample004_110_cpg.txt.gz --tabix ./calls/sample004_110.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample005_111_mextr_ctgs.bed --cpgfile ./extract/sample005_111_cpg.txt.gz --tabix ./calls/sample005_111.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample002_108_mextr_ctgs.bed --cpgfile ./extract/sample002_108_cpg.txt.gz --tabix ./calls/sample002_108.bcf INFO - Launch: /usr/local/lib/gemBS/bin/mextr --loglevel info --compress --md5 --regions-file ./extract/sample003_109_mextr_ctgs.bed --cpgfile ./extract/sample003_109_cpg.txt.gz --tabix ./calls/sample003_109.bcf
This pattern will continue over and over until interrupted. And then in debug mode (run on a different batch of five samples):
DEBUG - Asset check: "sample009_115_cpg.bed.gz" "./extract/sample009_115_cpg.bed.gz" Absent DEBUG - Asset check: "sample009_115_cpg.bed.gz.md5" "./extract/sample009_115_cpg.bed.gz.md5" Absent DEBUG - Asset check: "sample009_115_cpg.bb" "./extract/sample009_115_cpg.bb" Absent DEBUG - Asset check: "sample009_115_cpg.bb.md5" "./extract/sample009_115_cpg.bb.md5" Absent DEBUG - Asset check: "sample009_115_chg.bed.gz" "./extract/sample009_115_chg.bed.gz" Absent DEBUG - Asset check: "sample009_115_chg.bed.gz.md5" "./extract/sample009_115_chg.bed.gz.md5" Absent DEBUG - Asset check: "sample009_115_chg.bb" "./extract/sample009_115_chg.bb" Absent DEBUG - Asset check: "sample009_115_chg.bb.md5" "./extract/sample009_115_chg.bb.md5" Absent DEBUG - Asset check: "sample009_115_chh.bed.gz" "./extract/sample009_115_chh.bed.gz" Absent DEBUG - Asset check: "sample009_115_chh.bed.gz.md5" "./extract/sample009_115_chh.bed.gz.md5" Absent DEBUG - Asset check: "sample009_115_chh.bb" "./extract/sample009_115_chh.bb" Absent DEBUG - Asset check: "sample009_115_chh.bb.md5" "./extract/sample009_115_chh.bb.md5" Absent DEBUG - Asset check: "sample009115.bw" "./extract/sample009115.bw" Absent DEBUG - Asset check: "sample009115.bw.md5" "./extract/sample009115.bw.md5" Absent DEBUG - Asset check: "report.tex" "./report/GemBS_QC_Report.tex" Present DEBUG - Asset check: "report.html" "./report/GemBS_QC_Report.html" Present DEBUG - Avail slots: 45.0001, avail memory: 11.6 GB DEBUG - No execution slots