Closed jasonwjohns closed 1 month ago
Looks like this could be a problem with where temp files are being written by bcftools. Try setting get_big_temp
in your config/config.yaml
file. It's been awhile since I've been on hummingbird, but iirc there was a scratch directory... maybe /hb/scratch
? Probably a good place to start.
Edit: Could be related to #87?
Oh hm looks like we may need to bump bcftools version: https://github.com/samtools/bcftools/issues/1642
Ok thanks for tracking this down, Cade. I modified the bcftools environment to: `dependencies:
The error I got this time was:
bcftools: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory
Looks like it was mentioned here, but I don't see a conda based solution: https://github.com/samtools/bcftools/issues/1698#issue-1197030593
Is it strange that the workflow ran fine before with a smaller dataset, or is that expected?
Ah that’s annoying. It would make sense that it worked fine on a smaller dataset. It looks like bcftools<1.16 uses /tmp no matter what, so a bigger dataset might fill that up depending on how the cluster’s storage and disk quotas are setup. I’ll take a look at this some more tomorrow.
On Sat, Apr 20, 2024 at 23:08 jasonwjohns @.***> wrote:
Ok thanks for tracking this down, Cade. I modified the bcftools environment to: `dependencies:
- bcftools==1.16`
The error I got this time was: bcftools: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory
Looks like it was mentioned here, but I don't see a conda based solution: samtools/bcftools#1698 (comment) https://github.com/samtools/bcftools/issues/1698#issue-1197030593
Is it strange that the workflow ran fine before with a smaller dataset, or is that expected?
— Reply to this email directly, view it on GitHub https://github.com/harvardinformatics/snpArcher/issues/185#issuecomment-2067920095, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKVQJ4VUU4HILJWVIYYQYS3Y6NJXRAVCNFSM6AAAAABGRBYEO2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXHEZDAMBZGU . You are receiving this because you commented.Message ID: @.***>
Thanks a bunch!
Pretty strange. I remember that libgsl error when weve tried to upgrade bcftools in the past. However, I see that postprocess is 1.16 so I'm not sure why that environment builds by the bcftools.yml
one does
Thanks for chiming in Erik. I set the bcftools.yml back to 1.10 and pointed the 'bigtmp' to my home folder on the cluster. I think I get a TB of storage there, so hopefully that's enough to see this job through. It doesn't seem I have a scratch folder of my own to use, so I'll have to get that going.
For now the workflow is back up and running with the genomics_db_import step. Fingers crossed, but it seems like Cade's original suggestion to use the bigtmp option is working. Thanks guys!
Looking at this again I actually don't think the version is the issue. We specify -T/--temp-dir
in the bcftools sort
call so I think the issue just had to do with setting the right place to store temp files.
Makes sense to me. The workflow finished up last night, so it seems like you're right. Thanks Cade and Erik!
Hi guys,
I'm trying to run snparcher on UCSC's Hummingbird and I'm getting the following error at the concatgvcfs step: `Activating conda environment: .snakemake/conda/751f1c8c63aac0dc4eda59d9299aee0a Writing to /tmp Checking the headers and starting positions of 50 files Merging 62 temporary files [E::hts_open_format] Failed to open file "/tmp/00001.bcf" : No such file or directory Could not read /tmp/00001.bcf: No such file or directory Cleaning [Sat Apr 20 20:37:32 2024] Error in rule concat_gvcfs: jobid: 0 ` 2/32 of the samples ran concatenated successfully, but I'm getting error for all of the others. I did have to re-start the workflow at a couple points as I figured out how many jobs I could submit simultaneously. Not sure if that would be an issue. Below is an example of a full log file in case it's helpful. 441285.log
I'm running the following configuration in the slurm profile: config.yaml.txt
This is the slurm script I submit: nli3_june23.slurm.txt
Let me know if I should provide any other info.
Thank you! Jason