sbslee / pypgx

A Python package for pharmacogenomics (PGx) research
https://pypgx.readthedocs.io
MIT License
65 stars 13 forks source link

run-ngs-pipeline error with CYP2D6 #140

Closed Cyaneiss closed 4 weeks ago

Cyaneiss commented 1 month ago

Hello,

I'm trying to use the run-ngs-pipeline tool on different genes. The genes without SV works just fine, but I get this error when trying to run it on CYP2D6.

I'm on Ubuntu 24.04.1 LTS. I installed PyPGx using git clone. The two zip files have been generated with the adapted tools included in pypgx, with the right assembly. The .vcf.gz file was also generated using pypgx, with GRCh28 setup as the assembly.

Here is what happen when running run-ngs-pipeline on TMPT, this seems like the expected behaviour (but maybe it's not ?):

(base) user:~/Documents/JCB/pypgx/bamSophia$ pypgx run-ngs-pipeline TPMT TPMT_pipeline --variants AllSamples.vcf.gz --depth-of-coverage grch38_AllSamples_depth.zip --control-statistics grch38_AllSamples_stats.zip --platform Targeted --assembly GRCh38
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/pipeline.py:211: UserWarning: User provided CovFrame[DepthOfCoverage] even though the target gene does not have any star alleles defined by SVs. PyPGx will ignore it.
  warnings.warn(message)
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/pipeline.py:219: UserWarning: User provided SampleTable[Statistics] even though the target gene does not have any star alleles defined by SVs. PyPGx will ignore it.
  warnings.warn(message)
Saved VcfFrame[Imported] to: TPMT_pipeline/imported-variants.zip
Saved VcfFrame[Phased] to: TPMT_pipeline/phased-variants.zip
Saved VcfFrame[Consolidated] to: TPMT_pipeline/consolidated-variants.zip
Saved SampleTable[Alleles] to: TPMT_pipeline/alleles.zip
Saved SampleTable[Genotypes] to: TPMT_pipeline/genotypes.zip
Saved SampleTable[Phenotypes] to: TPMT_pipeline/phenotypes.zip
Saved SampleTable[Results] to: TPMT_pipeline/results.zip

And here is the error message when running it on CYP2D6 :

(base) user:~/Documents/JCB/pypgx/bamSophia$ pypgx run-ngs-pipeline CYP2D6 CYP2D6_pipeline --variants AllSamples.vcf.gz --depth-of-coverage grch38_AllSamples_depth.zip --control-statistics grch38_AllSamples_stats.zip --platform Targeted --assembly GRCh38
Saved VcfFrame[Imported] to: CYP2D6_pipeline/imported-variants.zip
Saved VcfFrame[Phased] to: CYP2D6_pipeline/phased-variants.zip
Saved VcfFrame[Consolidated] to: CYP2D6_pipeline/consolidated-variants.zip
Saved SampleTable[Alleles] to: CYP2D6_pipeline/alleles.zip
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/fuc/api/pycov.py:460: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  return cls(pd.read_table(fn))
Saved CovFrame[ReadDepth] to: CYP2D6_pipeline/read-depth.zip
Saved CovFrame[CopyNumber] to: CYP2D6_pipeline/copy-number.zip
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator SVC from version 0.24.2 when using version 1.5.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator LabelBinarizer from version 0.24.2 when using version 1.5.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator OneVsRestClassifier from version 0.24.2 when using version 1.5.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py:151: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  df = df.fillna(method='ffill')
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py:152: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  df = df.fillna(method='bfill')
Traceback (most recent call last):
  File "/home/bioinfo-bioch/miniconda3/bin/pypgx", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/__main__.py", line 33, in main
    commands[args.command].main(args)
  File "/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/cli/run_ngs_pipeline.py", line 159, in main
    pipeline.run_ngs_pipeline(
  File "/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/pipeline.py", line 293, in run_ngs_pipeline
    cnv_calls = utils.predict_cnv(copy_number, cnv_caller=cnv_caller)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py", line 1242, in predict_cnv
    copy_number = _process_copy_number(copy_number)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py", line 157, in _process_copy_number
    raise ValueError('Missing values detected')
ValueError: Missing values detected

From what I see, my sklearn version may be a cause of trouble (I have 1.5.2). Do I have to downgrade it to 0.24.2 ? When trying on another conda env to downgrade scklearn (and python), and running the same command, I get the following error message :

(python-sandbox) user:~/Documents/JCB/pypgx/bamSophia$ pypgx run-ngs-pipeline TPMT TPMT_pipeline --variants AllSamples.vcf.gz --depth-of-coverage grch38_AllSamples_depth.zip --control-statistics grch38_AllSamples_stats.zip --platform Targeted --assembly GRCh38                                                       
Traceback (most recent call last):                                                                                                                                                      
  File "/home/bioinfo-bioch/miniconda3/envs/python-sandbox/bin/pypgx", line 6, in <module>                                                                                              
    from pypgx.__main__ import main                                                                                                                                                     
  File "/home/bioinfo-bioch/miniconda3/envs/python-sandbox/lib/python3.7/site-packages/pypgx/__main__.py", line 4, in <module>                                                          
    from .cli import commands                                                                                                                                                           
  File "/home/bioinfo-bioch/miniconda3/envs/python-sandbox/lib/python3.7/site-packages/pypgx/cli/__init__.py", line 9, in <module>                                                      
    commands[f.stem.replace('_', '-')] = import_module(f'.{f.stem}', __package__)                                                                                                       
  File "/home/bioinfo-bioch/miniconda3/envs/python-sandbox/lib/python3.7/importlib/__init__.py", line 127, in import_module                                                             
    return _bootstrap._gcd_import(name[level:], package, level)                                                                                                                         
  File "/home/bioinfo-bioch/miniconda3/envs/python-sandbox/lib/python3.7/site-packages/pypgx/cli/compute_control_statistics.py", line 29, in <module>                                   
    """                                                                                                                                                                                 
AttributeError: module 'fuc.api.common' has no attribute '_script_name'

If I can provide any further information, I would do so gladly.

Thanks you in advance for your time and help. Best regards, Peter

sbslee commented 1 month ago

Hi @Cyaneiss,

  1. Please provide the list of installed packages and their versions from $ conda list.
  2. Can you confirm that you are able to run the GeT-RM tutorial for genes with SV? If you can't that means something went wrong during the PyPGx installation.
  3. For the erorr AttributeError: module 'fuc.api.common' has no attribute '_script_name' please see #60.

Steven

Cyaneiss commented 1 month ago

Hello @sbslee , thanks for the fast answer,

First of all, here's the list of packages on my (base) env, here I installed PyPGx with a git cloning. I essentially use it as my sandbox, so there's quite a few.

# packages in environment at /home/bioinfo-bioch/miniconda3:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
absl-py                   2.1.0                    pypi_0    pypi
aldy                      4.6                      pypi_0    pypi
anaconda-anon-usage       0.4.4           py312hfc0e8ea_100  
annotated-types           0.7.0                    pypi_0    pypi
appdirs                   1.4.4                    pypi_0    pypi
archspec                  0.2.3              pyhd3eb1b0_0  
argparse-dataclass        2.0.0                    pypi_0    pypi
attrs                     24.2.0                   pypi_0    pypi
biopython                 1.84                     pypi_0    pypi
boltons                   23.0.0          py312h06a4308_0  
brotli-python             1.0.9           py312h6a678d5_8  
bzip2                     1.0.8                h5eee18b_6  
c-ares                    1.19.1               h5eee18b_0  
ca-certificates           2024.7.2             h06a4308_0  
certifi                   2024.7.4        py312h06a4308_0  
cffi                      1.16.0          py312h5eee18b_1  
charset-normalizer        3.3.2              pyhd3eb1b0_0  
click                     8.1.7                    pypi_0    pypi
coloredlogs               15.0.1                   pypi_0    pypi
colormath                 3.0.0                    pypi_0    pypi
conda                     24.7.1          py312h06a4308_0  
conda-content-trust       0.2.0           py312h06a4308_1  
conda-inject              1.3.2                    pypi_0    pypi
conda-libmamba-solver     24.7.0             pyhd3eb1b0_0  
conda-package-handling    2.3.0           py312h06a4308_0  
conda-package-streaming   0.10.0          py312h06a4308_0  
configargparse            1.7                      pypi_0    pypi
connection-pool           0.0.3                    pypi_0    pypi
contourpy                 1.3.0                    pypi_0    pypi
cryptography              42.0.5          py312hdda0065_1  
cycler                    0.12.1                   pypi_0    pypi
cython                    3.0.11                   pypi_0    pypi
datrie                    0.8.2                    pypi_0    pypi
distro                    1.9.0           py312h06a4308_0  
docutils                  0.21.2                   pypi_0    pypi
dpath                     2.2.0                    pypi_0    pypi
expat                     2.6.2                h6a678d5_0  
fastjsonschema            2.20.0                   pypi_0    pypi
fmt                       9.1.0                hdb19cb5_1  
fonttools                 4.54.1                   pypi_0    pypi
frozendict                2.4.2           py312h06a4308_0  
fuc                       0.38.0                   pypi_0    pypi
gitdb                     4.0.11                   pypi_0    pypi
gitpython                 3.1.43                   pypi_0    pypi
humanfriendly             10.0                     pypi_0    pypi
humanize                  4.10.0                   pypi_0    pypi
icu                       73.1                 h6a678d5_0  
idna                      3.7             py312h06a4308_0  
immutabledict             4.2.0                    pypi_0    pypi
immutables                0.21                     pypi_0    pypi
importlib-metadata        8.5.0                    pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
inotify-simple            1.3.5                    pypi_0    pypi
jinja2                    3.1.4                    pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
jsonpatch                 1.33            py312h06a4308_1  
jsonpointer               2.1                pyhd3eb1b0_0  
jsonschema                4.23.0                   pypi_0    pypi
jsonschema-specifications 2024.10.1                pypi_0    pypi
jupyter-core              5.7.2                    pypi_0    pypi
kaleido                   0.2.1                    pypi_0    pypi
kiwisolver                1.4.7                    pypi_0    pypi
kmerexplor                1.1.0                    pypi_0    pypi
krb5                      1.20.1               h143b758_1  
ld_impl_linux-64          2.38                 h1181459_1  
libarchive                3.6.2                hfab0078_4  
libcurl                   8.7.1                h251f7ec_0  
libedit                   3.1.20230828         h5eee18b_0  
libev                     4.33                 h7f8727e_1  
libffi                    3.4.4                h6a678d5_1  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libmamba                  1.5.8                hfe524e5_2  
libmambapy                1.5.8           py312h2dafd23_2  
libnghttp2                1.57.0               h2d74bed_0  
libsolv                   0.7.24               he621ea3_1  
libssh2                   1.11.0               h251f7ec_0  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
libxml2                   2.13.1               hfdd30dd_2  
logbook                   1.7.0.post0              pypi_0    pypi
lxml                      5.3.0                    pypi_0    pypi
lz4-c                     1.9.4                h6a678d5_1  
mappy                     2.28                     pypi_0    pypi
markdown                  3.7                      pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib                3.9.2                    pypi_0    pypi
matplotlib-venn           1.1.1                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
menuinst                  2.1.2           py312h06a4308_0  
multiqc                   1.25                     pypi_0    pypi
natsort                   8.4.0                    pypi_0    pypi
nbformat                  5.10.4                   pypi_0    pypi
ncls                      0.0.68                   pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
networkx                  3.3                      pypi_0    pypi
numpy                     2.1.1                    pypi_0    pypi
openssl                   3.0.14               h5eee18b_0  
ortools                   9.11.4210                pypi_0    pypi
packaging                 24.1            py312h06a4308_0  
pandas                    2.2.3                    pypi_0    pypi
patsy                     0.5.6                    pypi_0    pypi
pcre2                     10.42                hebb0a14_1  
pillow                    10.4.0                   pypi_0    pypi
pip                       24.2            py312h06a4308_0  
plac                      1.4.3                    pypi_0    pypi
platformdirs              3.10.0          py312h06a4308_0  
plotly                    5.24.1                   pypi_0    pypi
pluggy                    1.5.0                    pypi_0    pypi
protobuf                  5.26.1                   pypi_0    pypi
psutil                    6.0.0                    pypi_0    pypi
pulp                      2.9.0                    pypi_0    pypi
pybind11-abi              5                    hd3eb1b0_0  
pycosat                   0.6.6           py312h5eee18b_1  
pycparser                 2.21               pyhd3eb1b0_0  
pydantic                  2.9.2                    pypi_0    pypi
pydantic-core             2.23.4                   pypi_0    pypi
pygments                  2.18.0                   pypi_0    pypi
pyparsing                 3.1.4                    pypi_0    pypi
pypgx                     0.25.0                   pypi_0    pypi
pyranges                  0.1.2                    pypi_0    pypi
pysam                     0.22.1                   pypi_0    pypi
pysocks                   1.7.1           py312h06a4308_0  
pytest                    8.3.3                    pypi_0    pypi
python                    3.12.4               h5148396_1  
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.2                   pypi_0    pypi
pyyaml                    6.0.2                    pypi_0    pypi
readline                  8.2                  h5eee18b_0  
referencing               0.35.1                   pypi_0    pypi
reproc                    14.2.4               h6a678d5_2  
reproc-cpp                14.2.4               h6a678d5_2  
requests                  2.32.3          py312h06a4308_0  
reretry                   0.11.8                   pypi_0    pypi
rich                      13.8.1                   pypi_0    pypi
rich-click                1.8.3                    pypi_0    pypi
rpds-py                   0.20.0                   pypi_0    pypi
ruamel.yaml               0.17.21         py312h5eee18b_0  
scikit-learn              1.5.2                    pypi_0    pypi
scipy                     1.14.1                   pypi_0    pypi
seaborn                   0.13.2                   pypi_0    pypi
setuptools                72.1.0          py312h06a4308_0  
six                       1.16.0                   pypi_0    pypi
smart-open                7.0.5                    pypi_0    pypi
smmap                     5.0.1                    pypi_0    pypi
snakemake                 8.23.0                   pypi_0    pypi
snakemake-interface-common 1.17.4                   pypi_0    pypi
snakemake-interface-executor-plugins 9.3.2                    pypi_0    pypi
snakemake-interface-report-plugins 1.1.0                    pypi_0    pypi
snakemake-interface-storage-plugins 3.3.0                    pypi_0    pypi
sorted-nearest            0.0.39                   pypi_0    pypi
spectra                   0.0.11                   pypi_0    pypi
sqlite                    3.45.3               h5eee18b_0  
statsmodels               0.14.4                   pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tenacity                  9.0.0                    pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
throttler                 1.2.2                    pypi_0    pypi
tk                        8.6.14               h39e8969_0  
tqdm                      4.66.4          py312he106c6f_0  
traitlets                 5.14.3                   pypi_0    pypi
truststore                0.8.0           py312h06a4308_0  
typeguard                 4.3.0                    pypi_0    pypi
typing-extensions         4.12.2                   pypi_0    pypi
tzdata                    2024.2                   pypi_0    pypi
urllib3                   2.2.2           py312h06a4308_0  
wheel                     0.43.0          py312h06a4308_0  
wrapt                     1.16.0                   pypi_0    pypi
xz                        5.4.6                h5eee18b_1  
yaml-cpp                  0.8.0                h6a678d5_1  
yte                       1.5.4                    pypi_0    pypi
zipp                      3.20.2                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_1  
zstandard                 0.22.0          py312h2c38b39_0  
zstd                      1.5.5                hc292b87_2  

And secondly, I forgot to include the error I get during the GeT-RM, my bad ! Here it is :

(base) user:~/Documents/JCB/pypgx/getrm-wgs-tutorial$ pypgx run-ngs-pipeline \
CYP2D6 \
grch37-CYP2D6-pipeline \
--variants grch37-variants.vcf.gz \
--depth-of-coverage grch37-depth-of-coverage.zip \
--control-statistics grch37-control-statistics-VDR.zip
Saved VcfFrame[Imported] to: grch37-CYP2D6-pipeline/imported-variants.zip
Saved VcfFrame[Phased] to: grch37-CYP2D6-pipeline/phased-variants.zip
Saved VcfFrame[Consolidated] to: grch37-CYP2D6-pipeline/consolidated-variants.zip
Saved SampleTable[Alleles] to: grch37-CYP2D6-pipeline/alleles.zip
Saved CovFrame[ReadDepth] to: grch37-CYP2D6-pipeline/read-depth.zip
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py:456: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '0        2.0000
1        2.0625
2        2.0625
3        2.0625
4        2.0625
          ...  
39379    0.5625
39380    0.5625
39381    0.5625
39382    0.5625
39383    0.5625
Name: NA19143_PyPGx, Length: 39384, dtype: float64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df.iloc[:, 2:] = df.iloc[:, 2:] / medians * 2
Saved CovFrame[CopyNumber] to: grch37-CYP2D6-pipeline/copy-number.zip
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator SVC from version 0.24.2 when using version 1.5.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator LabelBinarizer from version 0.24.2 when using version 1.5.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator OneVsRestClassifier from version 0.24.2 when using version 1.5.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py:151: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  df = df.fillna(method='ffill')
/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py:152: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  df = df.fillna(method='bfill')
Saved SampleTable[CNVCalls] to: grch37-CYP2D6-pipeline/cnv-calls.zip
Saved SampleTable[Genotypes] to: grch37-CYP2D6-pipeline/genotypes.zip
Saved SampleTable[Phenotypes] to: grch37-CYP2D6-pipeline/phenotypes.zip
Saved SampleTable[Results] to: grch37-CYP2D6-pipeline/results.zip

The pandas error part repeat itself a lot of times so I cut them one and only pasted one.

I will come back to you about trying pypgx with conda install once I tested it out further.

Best regards, Peter

sbslee commented 1 month ago

@Cyaneiss,

Please note that the "error" messages you received from the GeT-RM tutorial are actually warnings. They are fine. The fact that you obtained the SampleTable[Results] file means everything worked well.

sbslee commented 1 month ago

You can follow this section in the tutorial to make sure the results you got are accruate:

$ wget https://raw.githubusercontent.com/sbslee/pypgx-data/main/getrm-wgs-tutorial/grch37-CYP2D6-results.zip
$ pypgx compare-genotypes grch37-CYP2D6-pipeline/results.zip grch37-CYP2D6-results.zip
# Genotype
Total: 70
Compared: 70
Concordance: 1.000 (70/70)
# CNV
Total: 70
Compared: 70
Concordance: 1.000 (70/70)
Cyaneiss commented 1 month ago

@Cyaneiss,

Please note that the "error" messages you received from the GeT-RM tutorial are actually warnings. They are fine. The fact that you obtained the SampleTable[Results] file means everything worked well.

Thank you for the clarification. I thought these warnings were part of what caused my initial error. I get 100% concordance after using compare-genotypes, as you said. So the problem isn't caused by the installation itself. Do you maybe have an idea of what causes the error ?

Concerning my tries with installing and running PyPGx with conda, I tried the solution you linked me to (both with v0.15.0 and v0.25.0 of PyPGx), but got the following error :

(base) user:~/Documents/JCB/pypgx/bamSophia$ conda create -n pypgx -c bioconda conda-forge pypgx=0.15.0 fuc=0.33.1
Channels:
 - bioconda
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides matplotlib-venn needed by fuc-0.33.1-pyh5e36f6f_0
  - nothing provides requested conda-forge

Could not solve for environment specs
The following packages are incompatible
├─ conda-forge does not exist (perhaps a typo or a missing channel);
└─ fuc 0.33.1**  is not installable because it requires
   └─ matplotlib-venn, which does not exist (perhaps a missing channel).

Including conda-forge as channel seemed to solve this error, but later when trying to run run-ngs-pipeline on CYP2D6 (I also tried with TMPT just in case), I got another error with matplotlib :

(pypgx) user:~/Documents/JCB/pypgx/bamSophia$ pypgx run-ngs-pipeline CYP2D6 grch37-CYP2D6-pipeline --variants grch37-variants.vcf.gz --depth-of-coverage grch37-depth-of-coverage.zip --control-statistics grch37-control-statistics-VDR.zip                                                                               
Traceback (most recent call last):                                                                                                                                                      
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/bin/pypgx", line 6, in <module>                                                                                                       
    from pypgx.__main__ import main                                                                                                                                                     
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/pypgx/__init__.py", line 1, in <module>                                                                  
    from .api.core import (                                                                                                                                                             
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/pypgx/api/core.py", line 10, in <module>                                                                 
    from .. import sdk                                                                                                                                                                  
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/pypgx/sdk/__init__.py", line 1, in <module>                                                              
    from .utils import (Archive, add_cn_samples, compare_metadata, simulate_copy_number)                                                                                                
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/pypgx/sdk/utils.py", line 10, in <module>                                                                
    from fuc import pyvcf, pycov, common, pybam                                                                                                                                         
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/fuc/__init__.py", line 1, in <module>                                                                    
    from .api import *                                                                                                                                                                  
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/fuc/api/pyvcf.py", line 145, in <module>                                                                 
    from . import pybed, common, pymaf, pybam                                                                                                                                           
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/fuc/api/pybed.py", line 43, in <module>                                                                  
    from . import common                                                                                                                                                                
  File "/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/fuc/api/common.py", line 25, in <module>                                                                 
    from matplotlib.collections import BrokenBarHCollection                                                                                                                             
ImportError: cannot import name 'BrokenBarHCollection' from 'matplotlib.collections' (/home/bioinfo-bioch/miniconda3/envs/pypgx/lib/python3.12/site-packages/matplotlib/collections.py)

Best regards, Peter

Cyaneiss commented 1 month ago

I did more checking about the error I got initially.

When the error message points the following file :

  File "/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/pipeline.py", line 293, in run_ngs_pipeline
    cnv_calls = utils.predict_cnv(copy_number, cnv_caller=cnv_caller)

I checked the values of copy_number (got False) and cnv_caller (got None). Next I checked this file :

  File "/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py", line 1242, in predict_cnv
    copy_number = _process_copy_number(copy_number)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bioinfo-bioch/miniconda3/lib/python3.12/site-packages/pypgx/api/utils.py", line 157, in _process_copy_number
    raise ValueError('Missing values detected')

The error comes from this function :

def _process_copy_number(copy_number):

    df = copy_number.data.copy_df()
    region = core.get_region(copy_number.metadata['Gene'], assembly=copy_number.metadata['Assembly'])
    chrom, start, end = common.parse_region(region)

    if (end - start + 1) > copy_number.data.shape[0]:
        temp = pd.DataFrame.from_dict({'Temp': range(int(df.Position.iat[0]-1), int(df.Position.iat[-1])+1)})
        temp = temp.merge(df, left_on='Temp', right_on='Position', how='outer')
        df = temp.drop(columns='Temp')

    df = df.fillna(method='ffill')
    df = df.fillna(method='bfill')

    df.iloc[:, 2:] = df.iloc[:, 2:].apply(lambda c: median_filter(c, size=1000), axis=0)

    if df.isnull().values.any():
        raise ValueError('Missing values detected')

    return sdk.Archive(copy_number.copy_metadata(), pycov.CovFrame(df))

where df contains only NaN values.

I'm not sure why, but maybe it has something to do with the ressource bundle, but I installed it how the documentation shows. I also added it in my PATH just in case..

Hope that this can help you.

Cyaneiss commented 4 weeks ago

I recreate every single file I needed to run run-ngs-pipeline. This time I tried using a .txt file containing every .bam file I want to use. I also set the ID and SM of the .bam files to not just the sample name but the whole file name (including the extension). This way, the tool works just fine even on CYP2D6 and other genes that have SVs.

I'm still not understanding the error I got when creating this issue, but at least I have a bypass solution that works.

Thank you for you help and you time @sbslee

sbslee commented 4 weeks ago

@Cyaneiss Thanks for the update! I'm glad you found the solution on your own. Please feel free to open another Issue if you encounter any problems.