snakemake / snakemake-wrappers

This is the development home of the Snakemake wrapper repository, see
https://snakemake-wrappers.readthedocs.io
213 stars 183 forks source link

Association Testing Plink 1.90 #212

Closed adRn-s closed 3 years ago

adRn-s commented 3 years ago

Hello,

I'd like to contribute adding this tool.

I've already made my own pipeline with 3 rules: it starts with a target sequencing VCF file, makes all the necessary input files for plink using that and any column from CSV file with a continous variable (preprocessing). With these I make the association testing (with multiple testing correction by permutation) and finally output a nice HTML report (using Rmarkdown) with the "tsv" tables that Plink outputs.

Thanks in advance!

adRn-s commented 3 years ago
  • Is there any relevant docs I should read?

Nevermind this part of the issue, found it. Thanks

adRn-s commented 3 years ago

Please see this reproducible error.

> conda create -n wrappers snakemake pytest conda
> conda activate wrappers
> cp -r bio/vep bio/example
> vim test.py  # Here I copy test_vep_annotate() and rename it accordingly.
> pytest test.py -v
================================================================= test session starts =================================================================
platform linux -- Python 3.8.5, pytest-6.1.1, py-1.9.0, pluggy-0.13.1 -- /home/usr/.conda/envs/wrappers/bin/python
cachedir: .pytest_cache
rootdir: /home/usr/phd/20-10-01_wrapplines/snakemake-wrappers
collected 1 item                                                                                                                                      

test.py::test_example_annotate FAILED                                                                                                           [100%]

====================================================================== FAILURES =======================================================================
________________________________________________________________ test_example_annotate ________________________________________________________________

    def test_example_annotate():
>       run(
            "bio/plink/preprocess",
            ["snakemake", "--cores", "1", "variants.annotated.bcf", "--use-conda", "-F"],
        )

test.py:2437: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test.py:87: in run
    raise e
test.py:68: in run
    subprocess.check_call(cmd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

popenargs = (['snakemake', '--cores', '1', 'variants.annotated.bcf', '--use-conda', '-F', ...],), kwargs = {}, retcode = 1
cmd = ['snakemake', '--cores', '1', 'variants.annotated.bcf', '--use-conda', '-F', ...]

    def check_call(*popenargs, **kwargs):
        """Run command with arguments.  Wait for command to complete.  If
        the exit code was zero then return, otherwise raise
        CalledProcessError.  The CalledProcessError object will have the
        return code in the returncode attribute.

        The arguments are the same as for the call function.  Example:

        check_call(["ls", "-l"])
        """
        retcode = call(*popenargs, **kwargs)
        if retcode:
            cmd = kwargs.get("args")
            if cmd is None:
                cmd = popenargs[0]
>           raise CalledProcessError(retcode, cmd)
E           subprocess.CalledProcessError: Command '['snakemake', '--cores', '1', 'variants.annotated.bcf', '--use-conda', '-F', '--wrapper-prefix', 'file:///tmp/tmpxbwazl81/', '--conda-cleanup-pkgs']' returned non-zero exit status 1.

../../../.conda/envs/wrappers/lib/python3.8/subprocess.py:364: CalledProcessError
---------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------
5.19.3
---------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------
Building DAG of jobs...
File path file:///tmp/tmpxbwazl81/master/bio/vep/annotate/environment.yaml contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.
WorkflowError:
Conda env file does not exist: /tmp/tmpxbwazl81/master/bio/vep/annotate/environment.yaml
  File "/home/usr/.local/lib/python3.8/site-packages/snakemake/deployment/conda.py", line 216, in create
  File "/home/usr/.local/lib/python3.8/site-packages/snakemake/deployment/conda.py", line 97, in hash
  File "/home/usr/.local/lib/python3.8/site-packages/snakemake/deployment/conda.py", line 82, in content
  File "/home/usr/.local/lib/python3.8/site-packages/snakemake/deployment/conda.py", line 51, in content
=============================================================== short test summary info ===============================================================
FAILED test.py::test_example_annotate - subprocess.CalledProcessError: Command '['snakemake', '--cores', '1', 'variants.annotated.bcf', '--use-conda...
================================================================== 1 failed in 3.27s ==================================================================

And here is my conda env export, just in case:

name: wrappers
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - aioeasywebdav=2.4.0=py38_1000
  - aiohttp=3.6.2=py38h516909a_0
  - amply=0.1.4=py_0
  - appdirs=1.4.3=py_1
  - async-timeout=3.0.1=py_1000
  - atk=2.36.0=2
  - atk-1.0=2.36.0=h516909a_2
  - attrs=20.2.0=pyh9f0ad1d_0
  - bcrypt=3.2.0=py38h1e0a361_0
  - boto3=1.15.12=pyh9f0ad1d_0
  - botocore=1.18.12=pyh9f0ad1d_0
  - brotlipy=0.7.0=py38h1e0a361_1000
  - bzip2=1.0.8=h516909a_3
  - c-ares=1.16.1=h516909a_3
  - ca-certificates=2020.6.20=hecda079_0
  - cachetools=4.1.1=py_0
  - cairo=1.16.0=h3fc0475_1005
  - certifi=2020.6.20=py38h32f6830_0
  - cffi=1.14.3=py38h5bae8af_0
  - chardet=3.0.4=py38h32f6830_1007
  - coincbc=2.10.5=hab63836_0
  - conda=4.8.5=py38h32f6830_1
  - conda-package-handling=1.7.0=py38h1e0a361_5
  - configargparse=1.2.3=pyh9f0ad1d_0
  - cryptography=3.1.1=py38h766eaa4_0
  - datrie=0.8.2=py38h1e0a361_0
  - decorator=4.4.2=py_0
  - docutils=0.16=py38h32f6830_1
  - dropbox=10.4.1=pyh9f0ad1d_0
  - expat=2.2.9=he1b5a44_2
  - fftw=3.3.8=nompi_h7f3a6c3_1111
  - filechunkio=1.8=py_2
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=2.001=hab24e00_0
  - font-ttf-source-code-pro=2.030=hab24e00_0
  - font-ttf-ubuntu=0.83=hab24e00_0
  - fontconfig=2.13.1=h1056068_1002
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - freetype=2.10.2=he06d7ca_0
  - fribidi=1.0.10=h516909a_0
  - ftputil=4.0.0=py_0
  - gdk-pixbuf=2.38.2=h3f25603_4
  - gettext=0.19.8.1=hc5be6a0_1002
  - ghostscript=9.53.3=he1b5a44_0
  - giflib=5.2.1=h516909a_2
  - gitdb=4.0.5=py_0
  - gitpython=3.1.9=py_0
  - glib=2.66.1=h680cd38_0
  - gobject-introspection=1.66.1=py38h03d966d_0
  - google-api-core=1.22.2=py38h32f6830_0
  - google-api-python-client=1.12.3=pyh9f0ad1d_0
  - google-auth=1.22.0=py_0
  - google-auth-httplib2=0.0.4=pyh9f0ad1d_0
  - google-cloud-core=1.4.2=pyh9f0ad1d_0
  - google-cloud-storage=1.31.2=pyh9f0ad1d_0
  - google-crc32c=1.0.0=py38h6d3b9ce_0
  - google-resumable-media=1.0.0=pyh9f0ad1d_0
  - googleapis-common-protos=1.52.0=py38h32f6830_0
  - graphite2=1.3.13=he1b5a44_1001
  - graphviz=2.42.3=h6939c30_1
  - grpcio=1.31.0=py38h2c89da0_0
  - gtk2=2.24.32=h194ddfc_3
  - gts=0.7.6=h08bb679_0
  - harfbuzz=2.7.2=hee91db6_0
  - httplib2=0.18.1=pyh9f0ad1d_0
  - icu=67.1=he1b5a44_0
  - idna=2.10=pyh9f0ad1d_0
  - imagemagick=7.0.10_28=pl526h201ca68_0
  - importlib-metadata=2.0.0=py38h32f6830_0
  - importlib_metadata=2.0.0=0
  - iniconfig=1.0.1=pyh9f0ad1d_0
  - ipython_genutils=0.2.0=py_1
  - jbig=2.1=h516909a_2002
  - jinja2=2.11.2=pyh9f0ad1d_0
  - jmespath=0.10.0=pyh9f0ad1d_0
  - jpeg=9d=h516909a_0
  - jsonschema=3.2.0=py38h32f6830_1
  - jupyter_core=4.6.3=py38h32f6830_1
  - ld_impl_linux-64=2.35=h769bd43_9
  - libblas=3.8.0=17_openblas
  - libcblas=3.8.0=17_openblas
  - libcrc32c=1.1.1=he1b5a44_2
  - libffi=3.2.1=he1b5a44_1007
  - libgcc-ng=9.3.0=h5dbcf3e_17
  - libgfortran-ng=7.5.0=hae1eefd_17
  - libgfortran4=7.5.0=hae1eefd_17
  - libgomp=9.3.0=h5dbcf3e_17
  - libiconv=1.16=h516909a_0
  - liblapack=3.8.0=17_openblas
  - libopenblas=0.3.10=pthreads_hb3c22a3_4
  - libpng=1.6.37=hed695b0_2
  - libprotobuf=3.13.0=h8b12597_0
  - librsvg=2.50.1=h33a7fed_0
  - libsodium=1.0.18=h516909a_1
  - libstdcxx-ng=9.3.0=h2ae2ef3_17
  - libtiff=4.1.0=hc7e4089_6
  - libtool=2.4.6=h516909a_1005
  - libuuid=2.32.1=h14c3975_1000
  - libwebp=1.1.0=h56121f0_4
  - libwebp-base=1.1.0=h516909a_3
  - libxcb=1.13=h14c3975_1002
  - libxml2=2.9.10=h68273f3_2
  - lz4-c=1.9.2=he1b5a44_3
  - markupsafe=1.1.1=py38h1e0a361_1
  - more-itertools=8.5.0=py_0
  - multidict=4.7.5=py38h1e0a361_1
  - nbformat=5.0.7=py_0
  - ncurses=6.2=he1b5a44_1
  - networkx=2.5=py_0
  - numpy=1.19.1=py38hbc27379_2
  - oauth2client=4.1.3=py_0
  - openjpeg=2.3.1=h981e76c_3
  - openssl=1.1.1h=h516909a_0
  - packaging=20.4=pyh9f0ad1d_0
  - pandas=1.1.3=py38h950e882_0
  - pango=1.42.4=h7062337_4
  - paramiko=2.7.2=pyh9f0ad1d_0
  - pcre=8.44=he1b5a44_0
  - perl=5.26.2=h516909a_1006
  - pip=20.2.3=py_0
  - pixman=0.38.0=h516909a_1003
  - pkg-config=0.29.2=h516909a_1006
  - pluggy=0.13.1=py38h32f6830_2
  - prettytable=0.7.2=py_3
  - protobuf=3.13.0=py38h950e882_0
  - psutil=5.7.2=py38h1e0a361_0
  - pthread-stubs=0.4=h14c3975_1001
  - pulp=2.3=py38h32f6830_1
  - py=1.9.0=pyh9f0ad1d_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycosat=0.6.3=py38h1e0a361_1004
  - pycparser=2.20=pyh9f0ad1d_2
  - pygments=2.7.1=py_0
  - pygraphviz=1.6=py38h1e0a361_0
  - pynacl=1.4.0=py38h1e0a361_1
  - pyopenssl=19.1.0=py_1
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pyrsistent=0.17.3=py38h1e0a361_0
  - pysftp=0.2.9=py_1
  - pysocks=1.7.1=py38h32f6830_1
  - pytest=6.1.1=py38h32f6830_0
  - python=3.8.5=h1103e12_9_cpython
  - python-dateutil=2.8.1=py_0
  - python-irodsclient=0.8.2=py_0
  - python_abi=3.8=1_cp38
  - pytz=2020.1=pyh9f0ad1d_0
  - pyyaml=5.3.1=py38h1e0a361_0
  - ratelimiter=1.2.0=py38h32f6830_1001
  - readline=8.0=he28a2e2_2
  - requests=2.24.0=pyh9f0ad1d_0
  - rsa=4.6=pyh9f0ad1d_0
  - ruamel_yaml=0.15.80=py38h1e0a361_1002
  - s3transfer=0.3.3=py38h32f6830_1
  - setuptools=49.6.0=py38h32f6830_1
  - simplejson=3.17.2=py38h1e0a361_0
  - six=1.15.0=pyh9f0ad1d_0
  - slacker=0.14.0=py_0
  - smmap=3.0.4=pyh9f0ad1d_0
  - snakemake=5.26.1=0
  - snakemake-minimal=5.26.1=py_0
  - sqlite=3.33.0=h4cf870e_0
  - tk=8.6.10=hed695b0_0
  - toml=0.10.1=pyh9f0ad1d_0
  - toposort=1.5=py_3
  - tqdm=4.50.0=pyh9f0ad1d_0
  - traitlets=5.0.4=py_1
  - uritemplate=3.0.1=py_0
  - urllib3=1.25.10=py_0
  - wheel=0.35.1=pyh9f0ad1d_0
  - wrapt=1.12.1=py38h1e0a361_1
  - xmlrunner=1.7.7=py_0
  - xorg-kbproto=1.0.7=h14c3975_1002
  - xorg-libice=1.0.10=h516909a_0
  - xorg-libsm=1.2.3=h84519dc_1000
  - xorg-libx11=1.6.12=h516909a_0
  - xorg-libxau=1.0.9=h14c3975_0
  - xorg-libxdmcp=1.1.3=h516909a_0
  - xorg-libxext=1.3.4=h516909a_0
  - xorg-libxpm=3.5.13=h516909a_0
  - xorg-libxrender=0.9.10=h516909a_1002
  - xorg-libxt=1.1.5=h516909a_1003
  - xorg-renderproto=0.11.1=h14c3975_1002
  - xorg-xextproto=7.3.0=h14c3975_1002
  - xorg-xproto=7.0.31=h14c3975_1007
  - xz=5.2.5=h516909a_1
  - yaml=0.2.5=h516909a_0
  - yarl=1.4.2=py38h516909a_0
  - zipp=3.3.0=py_0
  - zlib=1.2.11=h516909a_1009
  - zstd=1.4.5=h6597ccf_2
prefix: /home/usr/.conda/envs/wrappers
GiulioCentorame commented 2 years ago

Hi, are you still working on the wrapper for plink 1.9?

adRn-s commented 2 years ago

Nope, sorry. To be honest, I abandoned it because my Ph.D. got a turn to new developments/ research questions.

Writing the wrapper also prove to be difficult because plink CLI is... to say the least, awkward. If you worked with it, you understand what I mean. This turned to be somewhat incompatible to the high quality standard that the snakemake wrappers repository require. I might have what was a working copy of this wrapper somewhere locally, if you require this tool wrapper, feel free to contact me in private: lims at omics dot dev ...I might reconsider re-posting it/ sending it over so that you can take over and fulfill that work.

GiulioCentorame commented 2 years ago

Hope the PhD is going/has gone well! Thanks, I will definitely get in touch with you - I am working on wrappers for a series of statistical genetics programs and plink support would be great. My idea for plink was to keep it minimal and try to support just the basic features for now (e.g. --score or --export).

adRn-s commented 2 years ago

I implemented the linear modeling for GWAS. For future references, the truncated work-in-progress I've done is uploaded here as a diff-patch file. The link goes straight to the comment with 'instructions' as to how to 'load' it. Good luck!