Open nate-d-olson opened 3 years ago
Hi @nate-d-olson,
Tried using the hap.py bioconda package but ran into a regex error when run in the snakemake/snakemake container. There is an issue on the hap.py github repo related to the error, Illumina/hap.py#66, which seems to be related to specific gcc versions.
It looks like version 0.3.12 of hap.py fixed the regex issue but we only got that version into Bioconda a couple of months after its release and Snakemake wrapper creation. I'll relay the need for the updated wrapper right away. And please feel free to let us know of any further issue you might encounter after we get the updated wrapper up!
Cheers, Marcel
There is another fix needed upstream, see https://github.com/Illumina/hap.py/pull/113
Ok, first PR is merged, hap.py should now work fine for you.
@mbargull has a look at dipcall now.
Thanks for your responsiveness and for working to update the hap.py bioconda package and hap.py/pre.py wrapper. We use the main hap.py function, which does the comparison and calculates performance metrics, in our pipeline. I tried to run the main hap.py command, hap.py {TRUTH}.vcf.gz {QUERY}.vcf.gz -f {TRUTH REGIONS}.bed -r {REF}.fa --engine=vcfeval --stratification {STRATIFICATIONS}.tsv -o {OUTDIR}
, with the new conda package and the snakemake/snakemake docker container and get the following error, likely introduced with the recent update to hap.py (https://github.com/Illumina/hap.py/pull/113).
2021-03-03 16:37:26,577 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
Hap.py
2021-03-03 16:37:26,700 ERROR 'Namespace' object has no attribute 'convert_gvcf_truth'
2021-03-03 16:37:26,700 ERROR Traceback (most recent call last):
2021-03-03 16:37:26,701 ERROR File "/opt/conda/envs/happy/bin/hap.py", line 529, in <module>
2021-03-03 16:37:26,701 ERROR main()
2021-03-03 16:37:26,701 ERROR File "/opt/conda/envs/happy/bin/hap.py", line 286, in main
2021-03-03 16:37:26,701 ERROR if args.convert_gvcf_truth:
2021-03-03 16:37:26,701 ERROR AttributeError: 'Namespace' object has no attribute 'convert_gvcf_truth'
They should really have a test framework. Yes, the main has to be adapted as well.
Here is my first attempt at making a wrapper for the main hap.py command. https://github.com/nate-d-olson/snakemake-wrappers/commit/34e7de271a9cc14546f51436d9f8986d04935813
I had a few questions regarding recommendations/ best practices for creating wrappers.
Thanks for asking, I've added comments on your commit.
@jafors and @johanneskoester thank you for your help with the hap.py bioconda packages and snakemake-wrapper. Now that we have a dipcall bioconda package I wanted to start working on the dipcall snakemake-wrapper. Running dipcall is a two-step process; first run-dipcall
generates a makefile with the dipcall pipeline, then make
is used to execute the pipeline. See example command from the dipcall README below, https://github.com/lh3/dipcall. Should the dipcall wrapper perform both steps, have separate wrappers for both step with a meta-wrapper combining the two, or something else?
dipcall.kit/run-dipcall prefix hs38.fa pat.fa.gz mat.fa.gz > prefix.mak
# or for male, requiring PAR regions in BED
# dipcall.kit/run-dipcall -x dipcall.kit/hs38.PAR.bed prefix hs38.fa pat.fa.gz mat.fa.gz > prefix.mak
make -j2 -f prefix.mak
So, I'd say, if the makefile generation is super quick, just put both in one wrapper. If it takes time, while being less parallel than the second step, use two wrappers, and ideally add a meta-wrapper for their combination.
This issue was marked as stale because it has been open for 6 months with no activity.
Is your feature request related to a problem? Please describe. Difficulty developing an assembly benchmarking pipeline due to the dependencies associated with two key bioinformatic tools use in the pipeline hap.py and dipcall. I know there is a snakemake wrapper for the hap.py pre.py command but we are running into a regex issue when running the hap.py bioconda package in the snakemake/snakemake docker container.
Describe the solution you'd like We would like to have snakemake wrappers for hap.py, https://github.com/Illumina/hap.py, and dipcall, https://github.com/lh3/dipcall.
Describe alternatives you've considered Tried using docker containers to handle dependencies but we were unable to get the pipeline to work on both macOS and Linux. Tried using the hap.py bioconda package but ran into a regex error when run in the snakemake/snakemake container. There is an issue on the hap.py github repo related to the error, https://github.com/Illumina/hap.py/issues/66, which seems to be related to specific gcc versions.
Additional context As part of the Genome in a Bottle project, https://www.nist.gov/programs-projects/genome-bottle we are working to improve the usability and interpretability of our methods for benchmarking variant calls. Hap.py is used for small variant benchmarking and is the reference implementation of the GA4GH best practices for small variant benchmarking. dipcall is used to generate variant calls from a diploid assembly. We developed a proof of concept pipeline using snakemake, https://github.com/usnistgov/giab-asm-benchmarking, but the pipeline only works on macOS due to a hack we used to handle the hap.py and dipcall dependencies.