opain / GenoPred

Genotype-based Prediction (GenoPred)
https://opain.github.io/GenoPred/
GNU General Public License v3.0
65 stars 21 forks source link

Error in downloading PRS file I think. #116

Closed samreenzafer closed 3 weeks ago

samreenzafer commented 1 month ago

Hi I am a first time user of Genopred, and have been able to successfully download genopred.

I am working on an hpc cluster, and while running the test Case, I get the error. I get the same on an interactive nose, as well as login node, and am attaching the run log from the attempt on interactive node.


[zafers02@li03c04 ~]$  bsub -P acc_rareADRs -q interactive  -n 4 -W 4:00  -R rusage[mem=5000]  -Is /bin/bash
<<Starting on lc02a30.hpc.XXXX.edu>>

[zafers02@lc02a30 ~]$ cd ~/softwares/GenoPred/GenoPred/pipeline/
[zafers02@lc02a30 pipeline]$ module load anaconda3/latest
[zafers02@lc02a30 pipeline]$ source activate genopred
(genopred) [zafers02@lc02a30 pipeline]$ conda config --set channel_priority strict
(genopred) [zafers02@lc02a30 pipeline]$ ml git
(genopred) [zafers02@lc02a30 pipeline]$ ml proxies
(genopred) [zafers02@lc02a30 pipeline]$  snakemake -j1 --configfile=example_input/config.yaml --use-conda output_all
Config file config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job                          count
-------------------------  -------
ancestry_inference_i             1
ancestry_reporter                1
download_pgs_external            2
download_pgscatalog_utils        1
format_target_all                1
format_target_i                  1
indiv_report_all                 1
output_all                       1
prep_pgs_external_i              2
sample_report_i                  1
score_reporter                   1
sumstat_prep_i                   1
target_pgs_all                   1
total                           15

Select jobs to execute...

[Tue Sep  3 12:38:49 2024]
rule download_pgscatalog_utils:
    output: resources/software/pgscatalog_utils/download_pgscatalog_utils.done
    log: resources/data/logs/download_pgscatalog_utils.log
    jobid: 15
    benchmark: resources/data/benchmarks/download_pgscatalog_utils.txt
    reason: Missing output files: resources/software/pgscatalog_utils/download_pgscatalog_utils.done
    resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/a1ca3d0129086fba197335c750a581c8_
[Tue Sep  3 12:39:02 2024]
Error in rule download_pgscatalog_utils:
    jobid: 15
    output: resources/software/pgscatalog_utils/download_pgscatalog_utils.done
    log: resources/data/logs/download_pgscatalog_utils.log (check log file(s) for error details)
    conda-env: /xxx/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_
    shell:

    {
      rm -r -f resources/software/pgscatalog_utils;       git clone https://github.com/PGScatalog/pgscatalog_utils.git resources/software/pgscatalog_utils;       cd resources/software/pgscatalog_utils;       git reset --hard 6da7eb0e157ba4e73f941233ee8d8ae4fb5e3926;       poetry install;       poetry build;       pip3 install --user dist/*.whl;       download_scorefiles -h > download_pgscatalog_utils.done
    } > resources/data/logs/download_pgscatalog_utils.log 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job download_pgscatalog_utils since they might be corrupted:
resources/software/pgscatalog_utils/download_pgscatalog_utils.done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-09-03T123839.147454.snakemake.log

Can you help me figure out what could be going wrong?

Thank you.

opain commented 1 month ago

Hi,

Interesting. Thanks for reaching out. Can you send me the contents of the file

resources/data/logs/download_pgscatalog_utils.log

This should show the error message.

Many thanks,

Ollie

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: samreenzafer @.> Sent: Tuesday, September 3, 2024 12:44:37 PM To: opain/GenoPred @.> Cc: Subscribed @.***> Subject: [opain/GenoPred] Error in downloading PRS file I think. (Issue #116)

Hi I am a first time user of Genopred, and have been able to successfully download genopred.

I am working on an hpc cluster, and while running the test Case, I get the error. I get the same on an interactive nose, as well as login node, and am attaching the run log from the attempt on interactive node.

@.*** ~]$ bsub -P acc_rareADRs -q interactive -n 4 -W 4:00 -R rusage[mem=5000] -Is /bin/bash <>

@. ~]$ cd ~/softwares/GenoPred/GenoPred/pipeline/ @. pipeline]$ module load anaconda3/latest @. pipeline]$ source activate genopred (genopred) @. pipeline]$ conda config --set channel_priority strict (genopred) @. pipeline]$ ml git (genopred) @. pipeline]$ ml proxies (genopred) @.*** pipeline]$ snakemake -j1 --configfile=example_input/config.yaml --use-conda output_all Config file config.yaml is extended by additional config specified via the command line. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count


ancestry_inference_i 1 ancestry_reporter 1 download_pgs_external 2 download_pgscatalog_utils 1 format_target_all 1 format_target_i 1 indiv_report_all 1 output_all 1 prep_pgs_external_i 2 sample_report_i 1 score_reporter 1 sumstat_prep_i 1 target_pgs_all 1 total 15

Select jobs to execute...

[Tue Sep 3 12:38:49 2024] rule download_pgscatalog_utils: output: resources/software/pgscatalog_utils/download_pgscatalog_utils.done log: resources/data/logs/download_pgscatalog_utils.log jobid: 15 benchmark: resources/data/benchmarks/download_pgscatalog_utils.txt reason: Missing output files: resources/software/pgscatalog_utils/download_pgscatalog_utils.done resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/a1ca3d0129086fba197335c750a581c8_ [Tue Sep 3 12:39:02 2024] Error in rule download_pgscatalog_utils: jobid: 15 output: resources/software/pgscatalog_utils/download_pgscatalog_utils.done log: resources/data/logs/download_pgscatalogutils.log (check log file(s) for error details) conda-env: /xxx/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8 shell:

{
  rm -r -f resources/software/pgscatalog_utils;       git clone https://github.com/PGScatalog/pgscatalog_utils.git resources/software/pgscatalog_utils;       cd resources/software/pgscatalog_utils;       git reset --hard 6da7eb0e157ba4e73f941233ee8d8ae4fb5e3926;       poetry install;       poetry build;       pip3 install --user dist/*.whl;       download_scorefiles -h > download_pgscatalog_utils.done
} > resources/data/logs/download_pgscatalog_utils.log 2>&1

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job download_pgscatalog_utils since they might be corrupted: resources/software/pgscatalog_utils/download_pgscatalog_utils.done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-09-03T123839.147454.snakemake.log

Can you help me figure out what could be going wrong?

Thank you.

— Reply to this email directly, view it on GitHubhttps://github.com/opain/GenoPred/issues/116, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKN4CSDP4MS2W52VBFCV4KLZUXRPLAVCNFSM6AAAAABNSQ7CSOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUYDGMZTGE4TQOI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

samreenzafer commented 1 month ago

Sure here it is.


[zafers02@li03c04 pipeline]$ cat    resources/data/logs/download_pgscatalog_utils.log 
Cloning into 'resources/software/pgscatalog_utils'...
HEAD is now at 6da7eb0 Merge pull request #68 from PGScatalog/dev
ERROR setuptools_scm._integration.setuptools pyproject.toml does not contain a tool.setuptools_scm section
Traceback (most recent call last):
  File "/sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages/setuptools_scm/_integration/pyproject_reading.py", line 53, in read_pyproject
    section = defn.get("tool", {})[tool_name]
KeyError: 'setuptools_scm'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages/setuptools_scm/_integration/setuptools.py", line 121, in infer_version
    config = _config.Configuration.from_file(dist_name=dist_name)
  File "/sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages/setuptools_scm/_config.py", line 128, in from_file
    pyproject_data = _read_pyproject(name, _load_toml=_load_toml)
  File "/sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages/setuptools_scm/_integration/pyproject_reading.py", line 55, in read_pyproject
    raise LookupError(f"{name} does not contain a tool.{tool_name} section") from e
LookupError: pyproject.toml does not contain a tool.setuptools_scm section
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: pgscatalog_utils (0.4.3)
ERROR setuptools_scm._integration.setuptools pyproject.toml does not contain a tool.setuptools_scm section
Traceback (most recent call last):
  File "/sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages/setuptools_scm/_integration/pyproject_reading.py", line 53, in read_pyproject
    section = defn.get("tool", {})[tool_name]
KeyError: 'setuptools_scm'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages/setuptools_scm/_integration/setuptools.py", line 121, in infer_version
    config = _config.Configuration.from_file(dist_name=dist_name)
  File "/sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages/setuptools_scm/_config.py", line 128, in from_file
    pyproject_data = _read_pyproject(name, _load_toml=_load_toml)
  File "/sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages/setuptools_scm/_integration/pyproject_reading.py", line 55, in read_pyproject
    raise LookupError(f"{name} does not contain a tool.{tool_name} section") from e
LookupError: pyproject.toml does not contain a tool.setuptools_scm section
Building pgscatalog_utils (0.4.3)
  - Building sdist
  - Built pgscatalog_utils-0.4.3.tar.gz
  - Building wheel
  - Built pgscatalog_utils-0.4.3-py3-none-any.whl
Processing ./dist/pgscatalog_utils-0.4.3-py3-none-any.whl
Requirement already satisfied: jq<2.0.0,>=1.2.2 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (1.6.0)
Requirement already satisfied: numpy<2.0.0,>=1.23.3 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (1.25.2)
Requirement already satisfied: pandas<2.0.0,>=1.4.3 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (1.5.3)
Requirement already satisfied: pandas-schema<0.4.0,>=0.3.6 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (0.3.6)
Requirement already satisfied: pgzip<0.4.0,>=0.3.2 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (0.3.5)
Requirement already satisfied: polars<0.16.0,>=0.15.0 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (0.15.18)
Requirement already satisfied: pyliftover<0.5,>=0.4 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (0.4)
Requirement already satisfied: requests<3.0.0,>=2.28.1 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (2.31.0)
Requirement already satisfied: scikit-learn<2.0.0,>=1.2.1 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (1.3.1)
Requirement already satisfied: zstandard<0.19.0,>=0.18.0 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pgscatalog-utils==0.4.3) (0.18.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pandas<2.0.0,>=1.4.3->pgscatalog-utils==0.4.3) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pandas<2.0.0,>=1.4.3->pgscatalog-utils==0.4.3) (2023.3.post1)
Requirement already satisfied: packaging in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from pandas-schema<0.4.0,>=0.3.6->pgscatalog-utils==0.4.3) (23.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from requests<3.0.0,>=2.28.1->pgscatalog-utils==0.4.3) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from requests<3.0.0,>=2.28.1->pgscatalog-utils==0.4.3) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from requests<3.0.0,>=2.28.1->pgscatalog-utils==0.4.3) (2.0.5)
Requirement already satisfied: certifi>=2017.4.17 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from requests<3.0.0,>=2.28.1->pgscatalog-utils==0.4.3) (2023.7.22)
Requirement already satisfied: scipy>=1.5.0 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from scikit-learn<2.0.0,>=1.2.1->pgscatalog-utils==0.4.3) (1.9.3)
Requirement already satisfied: joblib>=1.1.1 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from scikit-learn<2.0.0,>=1.2.1->pgscatalog-utils==0.4.3) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from scikit-learn<2.0.0,>=1.2.1->pgscatalog-utils==0.4.3) (3.2.0)
Requirement already satisfied: six>=1.5 in /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas<2.0.0,>=1.4.3->pgscatalog-utils==0.4.3) (1.16.0)
pgscatalog-utils is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
/usr/bin/bash: /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakemake/conda/a1ca3d0129086fba197335c750a581c8_/bin/download_scorefiles: /sc/arion/projects/CranioProject/softwares/GenoPred/GenoPred/pipeline/.snakema: bad interpreter: Permission denied
opain commented 3 weeks ago

Thank you very much for the log file. I was able to reproduce the error. The PGSC team have archived the pgscatalog_utils package, and migrated to using their new pygscatalog repo. I will update GenoPred to use the new package. I'll post here when this is done.

opain commented 3 weeks ago

I have just pushed a new release which should resolve your issue (https://github.com/opain/GenoPred/releases/tag/v2.2.11). It now uses the new pygscatalog package.