yeeus / GCI

A program for assessing the T2T genome continuity
MIT License
36 stars 1 forks source link

dealing with pacbiohifi #9

Closed ghost closed 1 month ago

ghost commented 4 months ago

hallo thank you for the gci and i am trying to use this for implementing in an analysis and it is giving this error.

Used arguments:{'reference': 'maternalpaternal.asm.dip.hap1.p_ctg.fa', 'hifi': ['ERR10930363.hifi.hap1.bam', 'ERR10930363.hifi.hap1.paf'], 'nano': None, 'chrs': None, 'regions': None, 'threshold': 0, 'dist_percent': 0.005, 'directory': '.', 'prefix': 'hap1', 'map_qual': 30, 'mq_cutoff': 50, 'iden_percent': 0.9, 'ovlp_percent': 0.9, 'clip_percent': 0.1, 'flank_len': 15, 'plot': True, 'depth_min': 0.1, 'depth_max': 4.0, 'window_size': 50000, 'image_type': 'png', 'force': False}
Finding gaps ...
Finding gaps done!!! Awesome! No gaps were found!

Filtering HiFi alignment files ...
Traceback (most recent call last):
  File "GCI.py", line 1058, in <module>
    GCI(**args)
  File "GCI.py", line 941, in GCI
    depths, targets_length = filter(hifi_paf, hifi_bam, prefix, map_qual, mq_cutoff, iden_percent, clip_percent, ovlp_percent, flank_len, directory, force, 'HiFi', chrs_list)
  File "GCI.py", line 189, in filter
    if (segment.is_mapped == True) and (segment.is_secondary == False) and (segment.is_supplementary == False) and (segment.mapping_quality >= map_qual):
AttributeError: 'pysam.libcalignedsegment.AlignedSegment' object has no attribute 'is_mapped'
Used arguments:{'reference': 'maternalpaternal.asm.dip.hap2.p_ctg.fa', 'hifi': ['ERR10930363.hifi.hap2.bam', 'ERR10930363.hifi.hap2.paf'], 'nano': None, 'chrs': None, 'regions': None, 'threshold': 0, 'dist_percent': 0.005, 'directory': '.', 'prefix': 'hap1', 'map_qual': 30, 'mq_cutoff': 50, 'iden_percent': 0.9, 'ovlp_percent': 0.9, 'clip_percent': 0.1, 'flank_len': 15, 'plot': True, 'depth_min': 0.1, 'depth_max': 4.0, 'window_size': 50000, 'image_type': 'png', 'force': False}
Finding gaps ...
Finding gaps done!!! Awesome! No gaps were found!

Filtering HiFi alignment files ...
Traceback (most recent call last):
  File "GCI.py", line 1058, in <module>
    GCI(**args)
  File "GCI.py", line 941, in GCI
    depths, targets_length = filter(hifi_paf, hifi_bam, prefix, map_qual, mq_cutoff, iden_percent, clip_percent, ovlp_percent, flank_len, directory, force, 'HiFi', chrs_list)
  File "GCI.py", line 189, in filter
    if (segment.is_mapped == True) and (segment.is_secondary == False) and (segment.is_supplementary == False) and (segment.mapping_quality >= map_qual):
AttributeError: 'pysam.libcalignedsegment.AlignedSegment' object has no attribute 'is_mapped'

the bam files are indexed and sorted and also the attribute as looks into the code is present there. let me know if you have such code issue before and how to address this for gci.

Thank you, Gaurav

yeeus commented 4 months ago

Thanks for trying this tool! I'm sorry for that I haven't met this issue before... Could you check the version of pysam, it seems the version you are using is outdated.

ghost commented 4 months ago

thank you @yeeus and please find a complete YAML for the conda environment for the gci.

name: genomehificontiguity
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - alsa-lib=1.2.11=hd590300_1
  - atk-1.0=2.38.0=hd4edc92_1
  - attr=2.5.1=h166bdaf_1
  - bamsnap=0.2.19=py_0
  - biopython=1.83=py310h2372a71_0
  - bzip2=1.0.8=hd590300_5
  - c-ares=1.28.1=hd590300_0
  - ca-certificates=2024.2.2=hbcca054_0
  - cairo=1.18.0=h3faef2a_0
  - canu=2.2=ha47f30e_0
  - chrpath=0.16=h7f98852_1002
  - dbus=1.13.6=h5008d03_3
  - expat=2.6.2=h59595ed_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=h77eed37_1
  - fontconfig=2.14.2=h14ed4e7_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - freetype=2.12.1=h267a509_2
  - fribidi=1.0.10=h36c2ea0_0
  - gdk-pixbuf=2.42.10=h6c15284_3
  - gettext=0.22.5=h59595ed_2
  - gettext-tools=0.22.5=h59595ed_2
  - giflib=5.2.2=hd590300_0
  - glib=2.80.0=hf2295e7_6
  - glib-tools=2.80.0=hde27a5a_6
  - gnuplot=5.4.8=h142138f_0
  - graphite2=1.3.13=h59595ed_1003
  - gst-plugins-base=1.22.9=hfa15dee_1
  - gstreamer=1.22.9=h98fc4e7_1
  - gtk2=2.24.33=h280cfa0_4
  - harfbuzz=8.3.0=h3d44ed6_0
  - htslib=1.20=h81da01d_0
  - icu=73.2=h59595ed_0
  - importlib-metadata=7.1.0=pyha770c72_0
  - k8=0.2.5=hdcf5f25_4
  - keyutils=1.6.1=h166bdaf_0
  - krb5=1.21.2=h659d440_0
  - lame=3.100=h166bdaf_1003
  - lcms2=2.15=h7f713cb_2
  - ld_impl_linux-64=2.40=h55db66e_0
  - lerc=4.0.0=h27087fc_0
  - libasprintf=0.22.5=h661eb56_2
  - libasprintf-devel=0.22.5=h661eb56_2
  - libblas=3.9.0=22_linux64_openblas
  - libcap=2.69=h0f662aa_0
  - libcblas=3.9.0=22_linux64_openblas
  - libclang=15.0.7=default_h127d8a8_5
  - libclang-cpp15=15.0.7=default_h127d8a8_5
  - libclang13=15.0.7=default_h5d6823c_5
  - libcups=2.3.3=h4637d8d_4
  - libcurl=8.7.1=hca28451_0
  - libdeflate=1.18=h0b41bf4_0
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=hd590300_2
  - libevent=2.1.12=hf998b51_1
  - libexpat=2.6.2=h59595ed_0
  - libffi=3.4.2=h7f98852_5
  - libflac=1.4.3=h59595ed_0
  - libgcc-ng=13.2.0=hc881cc4_6
  - libgcrypt=1.10.3=hd590300_0
  - libgd=2.3.3=he9388d3_8
  - libgettextpo=0.22.5=h59595ed_2
  - libgettextpo-devel=0.22.5=h59595ed_2
  - libgfortran-ng=13.2.0=h69a702a_6
  - libgfortran5=13.2.0=h43f5ff8_6
  - libglib=2.80.0=hf2295e7_6
  - libgomp=13.2.0=hc881cc4_6
  - libgpg-error=1.48=h71f35ed_0
  - libiconv=1.17=hd590300_2
  - libjpeg-turbo=2.1.5.1=hd590300_1
  - liblapack=3.9.0=22_linux64_openblas
  - libllvm15=15.0.7=hb3ce162_4
  - libllvm18=18.1.3=h2448989_0
  - libnghttp2=1.58.0=h47da74e_1
  - libnsl=2.0.1=hd590300_0
  - libogg=1.3.4=h7f98852_1
  - libopenblas=0.3.27=pthreads_h413a1c8_0
  - libopus=1.3.1=h7f98852_1
  - libpng=1.6.43=h2797004_0
  - libpq=15.6=h088ca5b_0
  - libsndfile=1.2.2=hc60ed4a_1
  - libsqlite=3.45.3=h2797004_0
  - libssh2=1.11.0=h0841786_0
  - libstdcxx-ng=13.2.0=h95c4c6d_6
  - libsystemd0=255=h3516f8a_1
  - libtiff=4.6.0=h8b53f26_0
  - libuuid=2.38.1=h0b41bf4_0
  - libvorbis=1.3.7=h9c3ff4c_0
  - libwebp=1.3.2=hdffd6e0_0
  - libwebp-base=1.3.2=hd590300_1
  - libxcb=1.15=h0b41bf4_0
  - libxcrypt=4.4.36=hd590300_1
  - libxkbcommon=1.7.0=h662e7e4_0
  - libxml2=2.12.6=h232c23b_2
  - libzlib=1.2.13=hd590300_5
  - lz4-c=1.9.4=hcb278e6_0
  - meryl=1.4.1=h4ac6f70_0
  - minimap2=2.28=he4a0461_0
  - mpg123=1.32.6=h59595ed_0
  - mysql-common=8.0.33=hf1915f5_6
  - mysql-libs=8.0.33=hca2cd23_6
  - ncurses=6.4.20240210=h59595ed_0
  - nspr=4.35=h27087fc_0
  - nss=3.98=h1d7d5a4_0
  - numpy=1.26.4=py310hb13e2d6_0
  - openjdk=20.0.2=hfea2f88_1
  - openjpeg=2.5.2=h488ebb8_0
  - openssl=3.2.1=hd590300_1
  - packaging=24.0=pyhd8ed1ab_0
  - pango=1.52.2=ha41ecd1_0
  - pcre2=10.43=hcad00b1_0
  - perl=5.32.1=7_hd590300_perl5
  - perl-filesys-df=0.92=pl5321h031d066_7
  - pillow=10.0.1=py310h29da1c1_1
  - pip=24.0=pyhd8ed1ab_0
  - pixman=0.43.2=h59595ed_0
  - pthread-stubs=0.4=h36c2ea0_1001
  - pulseaudio-client=16.1=hb77b528_5
  - pyfaidx=0.8.1.1=pyhdfd78af_0
  - pysam=0.22.0=py310h41dec4a_1
  - pytabix=0.1=py310h6cc9453_5
  - python=3.10.0=h543edf9_3_cpython
  - python_abi=3.10=4_cp310
  - pyvcf3=1.0.3=pyhdfd78af_0
  - qt-main=5.15.8=hc47bfe8_16
  - readline=8.2=h8228510_1
  - samtools=1.20=h50ea8bc_0
  - setuptools=69.5.1=pyhd8ed1ab_0
  - six=1.16.0=pyh6c4a22f_0
  - sqlite=3.45.3=h2c6b66d_0
  - tk=8.6.13=noxft_h4845f30_101
  - tzdata=2024a=h0c530f3_0
  - wheel=0.43.0=pyhd8ed1ab_1
  - winnowmap=2.03=h43eeafb_2
  - xcb-util=0.4.0=hd590300_1
  - xcb-util-image=0.4.0=h8ee46fc_1
  - xcb-util-keysyms=0.4.0=h8ee46fc_1
  - xcb-util-renderutil=0.3.9=hd590300_1
  - xcb-util-wm=0.4.1=h8ee46fc_1
  - xkeyboard-config=2.41=hd590300_0
  - xorg-fixesproto=5.0=h7f98852_1002
  - xorg-inputproto=2.3.2=h7f98852_1002
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libice=1.1.1=hd590300_0
  - xorg-libsm=1.2.4=h7391055_0
  - xorg-libx11=1.8.9=h8ee46fc_0
  - xorg-libxau=1.0.11=hd590300_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h0b41bf4_2
  - xorg-libxfixes=5.0.3=h7f98852_1004
  - xorg-libxi=1.7.10=h7f98852_0
  - xorg-libxrender=0.9.11=hd590300_0
  - xorg-libxt=1.3.0=hd590300_1
  - xorg-libxtst=1.2.3=h7f98852_1002
  - xorg-recordproto=1.14.2=h7f98852_1002
  - xorg-renderproto=0.11.1=h7f98852_1002
  - xorg-xextproto=7.3.0=h0b41bf4_1003
  - xorg-xf86vidmodeproto=2.3.1=h7f98852_1002
  - xorg-xproto=7.0.31=h7f98852_1007
  - xz=5.2.6=h166bdaf_0
  - zipp=3.17.0=pyhd8ed1ab_0
  - zlib=1.2.13=hd590300_5
  - zstd=1.5.5=hfc55251_0
prefix: /home/gauravsablok/miniconda3/envs/genomehificontiguity

let me know.

yeeus commented 4 months ago

The lateast version of pysam is v0.22.1 and yours is v0.22.0. However, I found the attribute is_mapped in the document of v0.22.0. So maybe I should check your enviroment... And can you just simple check your version of pysam by typing pip show pysam?

ghost commented 4 months ago

@yeeus ok that i can solve that up easily. $pip install auto-review and then pip update and it will update the packages. I will get back to you after doing this.

thank you, Gaurav

yeeus commented 4 months ago

I have tried your enviroment and ran GCI with my tested data, and finally it run properly. And I have found in my another enviroment with an older version of pysam than yours, everything runs without any errors.

$pip show pysam
Name: pysam
Version: 0.21.0
Summary: pysam - a python module for reading, manipulating and writing genomic data sets.
Home-page: https://github.com/pysam-developers/pysam
Author: Andreas Heger
Author-email: Andreas Heger <andreas.heger@gmail.com>
License: MIT License
Location: /path/to/mambaforge/envs/bioinfo/lib/python3.10/site-packages
Requires: cython
Required-by: bamsnap

so I think you should check your enviroment first.

ghost commented 4 months ago

@yeeus thank you for trying my environment and running the tested data. I am also glad to hear that it worked well. For those who use this environment, it will work well if you have the .conda/env/share loaded into your environment path. If you want to avoid that then simply change your shell with `chsh and select the appropriate one.

I will write if any other problem.

thank you, Gaurav

yeeus commented 4 months ago

Oh actually, there are some problems of dependency conflicts in your enviroment: there is no matplotlib! And when installed matplotlib I got

(gci_issue) [chenquanyu@login2 MH63]$ pip install matplotlib
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting matplotlib
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d6/07/061f97211f942101070a46fecd813a6b1bd83590ed7b07c473cabd707fe7/matplotlib-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.6/11.6 MB 5.8 MB/s eta 0:00:00
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/67/0f/6e5b4879594cd1cbb6a2754d9230937be444f404cf07c360c07a10b36aac/contourpy-1.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (305 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 305.2/305.2 kB 2.9 MB/s eta 0:00:00
Collecting cycler>=0.10 (from matplotlib)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl (8.3 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/67/09/e09ee013d9d6f2f006147e5fc2b4d807eb2931f4f890c2d4f711e10391d7/fonttools-4.51.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.6/4.6 MB 6.5 MB/s eta 0:00:00
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/6f/40/4ab1fdb57fced80ce5903f04ae1aed7c1d5939dda4fd0c0aa526c12fe28a/kiwisolver-1.4.5-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 5.2 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.21 in /share/home/zhanglab/user/chenquanyu/.local/lib/python3.10/site-packages (from matplotlib) (1.26.1)
Requirement already satisfied: packaging>=20.0 in /share/home/zhanglab/user/chenquanyu/mambaforge/envs/gci_issue/lib/python3.10/site-packages (from matplotlib) (24.0)
Requirement already satisfied: pillow>=8 in /share/home/zhanglab/user/chenquanyu/mambaforge/envs/gci_issue/lib/python3.10/site-packages (from matplotlib) (10.0.1)
Collecting pyparsing>=2.3.1 (from matplotlib)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9d/ea/6d76df31432a0e6fdf81681a895f009a4bb47b3c39036db3e1b528191d52/pyparsing-3.1.2-py3-none-any.whl (103 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.2/103.2 kB 817.6 kB/s eta 0:00:00
Collecting python-dateutil>=2.7 (from matplotlib)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 kB 931.2 kB/s eta 0:00:00
Requirement already satisfied: six>=1.5 in /share/home/zhanglab/user/chenquanyu/mambaforge/envs/gci_issue/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Installing collected packages: python-dateutil, pyparsing, kiwisolver, fonttools, cycler, contourpy, matplotlib
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pandas 2.1.2 requires pytz>=2020.1, which is not installed.
pandas 2.1.2 requires tzdata>=2022.1, which is not installed.
Successfully installed contourpy-1.2.1 cycler-0.12.1 fonttools-4.51.0 kiwisolver-1.4.5 matplotlib-3.8.4 pyparsing-3.1.2 python-dateutil-2.9.0.post0

But after installing pytz>=2020.1 and tzdata>=2022.1, it worked well. Thanks for using my tool again!

ghost commented 3 months ago

Hi thank you and can you upload a sample datasets with the bam files and the corresponding files for the same. Gaurav

yeeus commented 3 months ago

I'm sorry for not giving the test file due to the large size. You can just download sequencing reads of CHM13v2.0 and align reads to the assembly. Be cautious, you should remove chromosome Y and mitochondria before aligning. After getting the alignment files, you can run GCI with them and I think you would get the similar results as in the folder benchmark.

ghost commented 3 months ago

@yeeus thank you and i have just access to the computer now and in the benchmark folder you dont have the bam files, i think you have the bed files. So does it use the bam or the bed files. Can you put a subset of the aligned bam and a single test one. You can easily subset a bam file and in this way, i will be able to see what actually doesnt produce the alignment map.

Thank you, Gaurav

yeeus commented 1 month ago

Sorry for delaying reply. Now you can test GCI on the downsampled files from zenodo. Please look at the latest README.

ghost commented 1 month ago

Thank you and i will have a look when i will be working on the same. Right now i am not but i thank you for taking time to add the sample files.