harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
74 stars 33 forks source link

Compatibility issue with Python and cov_filter #216

Open elizakirsch0 opened 3 months ago

elizakirsch0 commented 3 months ago

Hello,

I am trying to do a test run using this code: snakemake -d .test/ecoli --cores 1 --use-conda

However, the environment creation for cov_filter is failing with the error: "Could not create conda environment." The other packages seemed to download smoothly. I tried to redownload cov_filter independently and got this error:

Encountered problems while solving:
  - nothing provides _python_rc needed by python-3.12.0rc3-rc3_hab00c5b_1_cpython
Could not solve for environment specs
The following packages are incompatible
├─ python 3.10**  is requested and can be installed;
└─ snakemake >=8  is not installable because there are no viable options
   ├─ snakemake [8.0.0|8.0.1|...|8.9.0] would require
   │  └─ snakemake-minimal [8.0.0.* |8.0.1.* |...|8.9.0.* ], which requires
   │     └─ python >=3.11,<3.13  but there are no viable options
   │        ├─ python [3.11.0|3.11.1|...|3.12.5] conflicts with any installable versions previously reported;
   │        └─ python 3.12.0rc3 would require
   │           └─ _python_rc, which does not exist (perhaps a missing channel);
   └─ snakemake 8.11.2 would require
      └─ snakemake-minimal 8.11.2.* , which does not exist (perhaps a missing channel).

Which I think maybe because it requests Python 3.10, while I am currently using Python 3.11.4 in my snparcher environment. My two ideas would be to create a new environment specifically for cov_filter or to try to merge the dependencies by updating the snparcher environment with this file, but I am worried about messing up other packages and dependencies downstream. Any advice on how to proceed would be greatly appreciated.

Thank you for your help in advance!

tsackton commented 3 months ago

Something seems a bit strange here, since the cov-filter environment shouldn't be trying to install Snakemake. In your log it looks like the conflict is between Python 3.10 and Snakemake >8.

Is there anything other than "Could not create conda environment" in the error logs from the Snakemake run itself?

elizakirsch0 commented 3 months ago

Hi,

Pasting the full error message here:

Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /project/sedmands_1143/ekirsch/V2_savannahsparrow/snp_archer/snpArcher/workflow/rules/../envs/cov_filter.yml:
Command:
mamba env create --quiet --no-default-packages --file "/project/sedmands_1143/ekirsch/V2_savannahsparrow/snp_archer/snpArcher/.test/ecoli/.snakemake/conda/2
fa3e34701e6b436725e6f08e9b3790f_.yaml" --prefix "/project/sedmands_1143/ekirsch/V2_savannahsparrow/snp_archer/snpArcher/.test/ecoli/.snakemake/conda/2fa3e34
701e6b436725e6f08e9b3790f_"
Output:
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Installing pip dependencies: ...working... Pip subprocess error:
  error: subprocess-exited-with-error

  × Building wheel for pyd4 (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [256 lines of output]
      <string>:7: SetuptoolsDeprecationWarning: The test command is disabled and references to it are deprecated.
      !!
tsackton commented 2 months ago

What system / OS are you using? This looks like a build error with the pyd4 package. I tested on our HPC with Rocky 8 Linux and the cov_filter environment builds fine. But we have had a lot of problems with pyd4 compatibility so it is possible there is a conflict with your system.

elizakirsch0 commented 2 months ago

This is the system I am using:

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

It does seem to be a problem with the pyd4 compatibility.

Thanks!

cademirch commented 2 months ago

Think I figured it out. pyd4 needs Python 3.9.

Please test it out: Create fresh mamba env:

$ mamba create -n pyd4-testing "python=3.9"
$ mamba activate pyd4-testing

Check python version and clear pip cache

$ python3 --version
Python 3.9.19
$ pip cache purge
Files removed: 41
$ pip list
Package    Version
---------- -------
pip        24.2
setuptools 72.2.0
wheel      0.44.0

Install pyd4

$ pip install pyd4
Collecting pyd4
  Downloading pyd4-0.3.9.tar.gz (14 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting numpy (from pyd4)
  Downloading numpy-2.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Downloading numpy-2.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.5/19.5 MB 64.1 MB/s eta 0:00:00
Building wheels for collected packages: pyd4
  Building wheel for pyd4 (pyproject.toml) ... done
  Created wheel for pyd4: filename=pyd4-0.3.9-cp39-cp39-linux_x86_64.whl size=3210298 sha256=d1bda477d95a302c58b3b3b42fc210c18bfd7cdc437cf95196c45dbf62d79f1a
  Stored in directory: /home/cade/.cache/pip/wheels/03/2a/3f/da1d1475762ff50a88863cecec974e3f4031bd7fd1a0b5c3c8
Successfully built pyd4
Installing collected packages: numpy, pyd4
Successfully installed numpy-2.0.1 pyd4-0.3.9
elizakirsch0 commented 2 months ago

Thank you for the response! Unfortunately I am still getting the same error message. Pasting all the output below:

bash-4.2$ mamba create -n pyd4-testing "python=3.9"
Looking for: ['python=3.9']
warning  libmamba Cache file "/home1/ekirsch/.conda/pkgs/cache/497deca9.json" was modified by another program
warning  libmamba Cache file "/home1/ekirsch/.conda/pkgs/cache/09cdf8bf.json" was modified by another program
conda-forge/noarch                                  16.1MB @  43.4MB/s  2.0s
conda-forge/linux-64                                37.1MB @  16.7MB/s  3.7s
Transaction
  Prefix: /home1/ekirsch/.conda/envs/pyd4-testing
  Updating specs:
   - python=3.9
  Package              Version  Build               Channel           Size
────────────────────────────────────────────────────────────────────────────
  Install:
────────────────────────────────────────────────────────────────────────────
  + _libgcc_mutex          0.1  conda_forge         conda-forge     Cached
  + ld_impl_linux-64      2.40  hf3520f5_7          conda-forge     Cached
  + ca-certificates   2024.7.4  hbcca054_0          conda-forge     Cached
  + libgomp             14.1.0  h77fa898_0          conda-forge     Cached
  + _openmp_mutex          4.5  2_gnu               conda-forge     Cached
  + libgcc-ng           14.1.0  h77fa898_0          conda-forge     Cached
  + openssl              3.3.1  h4bc722e_2          conda-forge     Cached
  + libxcrypt           4.4.36  hd590300_1          conda-forge     Cached
  + libzlib              1.3.1  h4ab18f5_1          conda-forge     Cached
  + libffi               3.4.2  h7f98852_5          conda-forge     Cached
  + bzip2                1.0.8  h4bc722e_7          conda-forge     Cached
  + ncurses                6.5  h59595ed_0          conda-forge     Cached
  + libuuid             2.38.1  h0b41bf4_0          conda-forge     Cached
  + libnsl               2.0.1  hd590300_0          conda-forge     Cached
  + xz                   5.2.6  h166bdaf_0          conda-forge     Cached
  + tk                  8.6.13  noxft_h4845f30_101  conda-forge     Cached
  + libsqlite           3.46.0  hde9e2c9_0          conda-forge     Cached
  + readline               8.2  h8228510_1          conda-forge     Cached
  + tzdata               2024a  h0c530f3_0          conda-forge     Cached
  + python              3.9.19  h0755675_0_cpython  conda-forge       24MB
  + wheel               0.44.0  pyhd8ed1ab_0        conda-forge     Cached
  + setuptools          72.2.0  pyhd8ed1ab_0        conda-forge     Cached
  + pip                   24.2  pyhd8ed1ab_0        conda-forge     Cached
  Summary:
  Install: 23 packages
  Total download: 24MB
────────────────────────────────────────────────────────────────────────────
python                                              23.8MB @ 122.9MB/s  0.4s
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
To activate this environment, use
     $ mamba activate pyd4-testing
To deactivate an active environment, use
     $ mamba deactivate

bash-4.2$ mamba activate pyd4-testing

(pyd4-testing) bash-4.2$ python3 --version
Python 3.9.19

(pyd4-testing) bash-4.2$ pip cache purge
Files removed: 163

(pyd4-testing) bash-4.2$ pip install pyd4
Collecting pyd4
  Downloading pyd4-0.3.9.tar.gz (14 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting numpy (from pyd4)
  Downloading numpy-2.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Downloading numpy-2.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.5/19.5 MB 81.2 MB/s eta 0:00:00
Building wheels for collected packages: pyd4
  Building wheel for pyd4 (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for pyd4 (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [38 lines of output]
      <string>:7: SetuptoolsDeprecationWarning: The test command is disabled and references to it are deprecated.
      !!

              ********************************************************************************
              Please remove any references to `setuptools.command.test` in all supported versions of the affected package.

              By 2024-Nov-15, you need to update your project and remove deprecated calls
              or your builds will no longer be supported.
              ********************************************************************************

      !!
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-39
      creating build/lib.linux-x86_64-cpython-39/pyd4
      copying pyd4/__init__.py -> build/lib.linux-x86_64-cpython-39/pyd4
      running egg_info
      writing pyd4.egg-info/PKG-INFO
      writing dependency_links to pyd4.egg-info/dependency_links.txt
      writing requirements to pyd4.egg-info/requires.txt
      writing top-level names to pyd4.egg-info/top_level.txt
      reading manifest file 'pyd4.egg-info/SOURCES.txt'
      writing manifest file 'pyd4.egg-info/SOURCES.txt'
      running build_ext
      running build_rust
      error: can't find Rust compiler

      If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Instal
ling from the wheel would avoid the need for a Rust compiler.

      To update pip, run:

          pip install --upgrade pip

      and then retry package installation.

      If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH duri
ng installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pyd4
Failed to build pyd4
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based pro
cademirch commented 2 months ago

Oh hmm, I guess I already have Rust on my system. Can you try mamba install -c conda-forge rust then the pip install?

elizakirsch0 commented 2 months ago

When I try to install rust, it says that I already have it on my system too. But then it still gives me the same error:

(pyd4-testing) bash-4.2$ mamba install -c conda-forge rust Looking for: ['rust'] conda-forge/linux-64 Using cache conda-forge/noarch Using cache Pinned packages:

tsackton commented 2 months ago

Can you post the full environment definition with all versions for pyd4-testing? There must be a conflict somewhere but it is difficult to reproduce because I am not having trouble installing pyd4 via pip in either a py3.9 or py3.10 environment...

elizakirsch0 commented 2 months ago

I think I figured it out! I had to edit the cov_filter.yml file to this:

channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - bedtools==2.30.0
  - mosdepth==0.3.3
  - d4tools>=0.3.4
  - python=3.10
  - pip=24.0
  - rust==1.77.1
  - gcc==13.2.0
  - binutils=2.40
  - make=4.3
  - git
  - gxx
  - pip:
      - "--editable=git+https://github.com/38/d4-format.git#egg=pyd4&subdirectory=pyd4"

After editing this and purging all other modules, I was able to get the test run to complete fully. However there is a message about missing output files in my slurm output file. Here is an example:

benchmark: benchmarks/GCA_003018455.1/download_ref/benchmark.txt
reason: Missing output files: results/GCA_003018455.1/data/genome/GCA_003018455.1.fna

Also, when I look at some of the .log files, they are empty. Ex. the log file at this path is empty: /project/sedmands_1143/ekirsch/V2_savannahsparrow/snp_archer/snpArcher/.test/ecoli/logs/GCA_000008865.2/compute_d4

Are the missing output file messages / empty log files expected when running the ecoli test?

Thank you!

cademirch commented 2 months ago

Interesting fix. It would be great to understand your env specs that were preventing the regular pip install still, though.

As for:

reason: Missing output files: results/GCA_003018455.1/data/genome/GCA_003018455.1.fna

This is Snakemake telling you the reason it wants to run the given rule. In this case, the reference genome file is missing, thus the download ref rule will be run.

elizakirsch0 commented 2 months ago

I ran this command conda env export --from-history > pyd4-testing.yml to try to get the full environment definition (not sure if this is right so let me know if you want me to do something else). This is the contents of the output file:

name: pyd4-testing
channels:
  - conda-forge
dependencies:
  - python=3.9
  - rust
  - ca-certificates
  - openssl
prefix: /home1/ekirsch/.conda/envs/pyd4-testing

Got it for the missing output files message, thank you for clarifying!