nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
410 stars 74 forks source link

Medaka variant issue #365

Closed jpn2021 closed 2 years ago

jpn2021 commented 2 years ago

After successfully running mini_align (with -m -f -a -A) and medaka consensus (--model r941_min_hac_variant_g507) and getting .bam, .bam.bai, and .hdf files I get the below error from running medaka variant:

[13:58:41 - DataIndx] Loaded 1/1 (100.00%) sample files.
Traceback (most recent call last):
  File "/home/unix/miniconda3/envs/medaka/bin/medaka", line 11, in <module>
    sys.exit(main())
  File "/home/unix/miniconda3/envs/medaka/lib/python3.8/site-packages/medaka/medaka.py", line 720, in main
    args.func(args)
  File "/home/unix/miniconda3/envs/medaka/lib/python3.8/site-packages/medaka/variant.py", line 217, in variants_from_hdf
    contigs=['{},length={}'.format(r.ref_name, lengths[r.ref_name])
  File "/home/unix/miniconda3/envs/medaka/lib/python3.8/site-packages/medaka/variant.py", line 217, in <listcomp>
    contigs=['{},length={}'.format(r.ref_name, lengths[r.ref_name])
KeyError: 'contig_1'

Sometimes the KeyError is contig_10 instead of contig_1.


Environment (if you do not have a GPU, write No GPU):


Additional context

# packages in environment at /home/unix/miniconda3/envs/medaka:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
_tflow_select             2.3.0                       mkl
absl-py                   1.0.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.1            py38h0a891b7_1    conda-forge
aiosignal                 1.2.0              pyhd8ed1ab_0    conda-forge
astor                     0.8.1              pyh9f0ad1d_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     21.4.0             pyhd8ed1ab_0    conda-forge
bcftools                  1.15                 h0ea216a_2    bioconda
biopython                 1.79             py38h497a2fe_1    conda-forge
blinker                   1.4                        py_1    conda-forge
brotli                    1.0.9                h166bdaf_7    conda-forge
brotli-bin                1.0.9                h166bdaf_7    conda-forge
brotlipy                  0.7.0           py38h0a891b7_1004    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
cachetools                5.0.0              pyhd8ed1ab_0    conda-forge
certifi                   2021.10.8        py38h578d9bd_2    conda-forge
cffi                      1.15.0           py38h3931269_0    conda-forge
chardet                   4.0.0            py38h578d9bd_3    conda-forge
charset-normalizer        2.0.12             pyhd8ed1ab_0    conda-forge
click                     8.1.2            py38h578d9bd_0    conda-forge
cryptography              36.0.2           py38h2b5fc30_1    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
fonttools                 4.31.2           py38h0a891b7_1    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
frozenlist                1.3.0            py38h0a891b7_1    conda-forge
gast                      0.3.3                      py_0    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
google-auth               2.6.2              pyh6c4a22f_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
grpcio                    1.45.0           py38ha0cdfde_0    conda-forge
gsl                       2.7                  he838d99_0    conda-forge
h5py                      2.10.0          nompi_py38h9915d05_106    conda-forge
hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
htslib                    1.15                 h9753748_0    bioconda
idna                      3.3                pyhd8ed1ab_0    conda-forge
importlib-metadata        4.11.3           py38h578d9bd_1    conda-forge
intervaltree              3.0.2                      py_0    conda-forge
isa-l                     2.30.0               ha770c72_4    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
jpeg                      9e                   h7f98852_0    conda-forge
k8                        0.2.5                hd03093a_2    bioconda
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.2            py38h43d8883_1    conda-forge
krb5                      1.19.3               h3790be6_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libblas                   3.9.0           14_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
libbrotlidec              1.0.9                h166bdaf_7    conda-forge
libbrotlienc              1.0.9                h166bdaf_7    conda-forge
libcblas                  3.9.0           14_linux64_openblas    conda-forge
libcurl                   7.82.0               h7bff187_0    conda-forge
libdeflate                1.10                 h7f98852_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 11.2.0              h1d223b6_14    conda-forge
libgfortran-ng            11.2.0              h69a702a_14    conda-forge
libgfortran5              11.2.0              h5c6108e_14    conda-forge
libgomp                   11.2.0              h1d223b6_14    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           14_linux64_openblas    conda-forge
libnghttp2                1.47.0               h727a467_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.20          pthreads_h78a6416_0    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libprotobuf               3.20.0               h6239696_0    conda-forge
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_14    conda-forge
libtiff                   4.3.0                h542a066_3    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp                   1.2.2                h3452ae3_0    conda-forge
libwebp-base              1.2.2                h7f98852_1    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libzlib                   1.2.11            h166bdaf_1014    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
mappy                     2.24             py38h4c6a040_1    bioconda
markdown                  3.3.6              pyhd8ed1ab_0    conda-forge
matplotlib-base           3.5.1            py38hf4fb855_0    conda-forge
medaka                    1.6.0            py38h84d2cc8_0    bioconda
minimap2                  2.24                 h7132678_1    bioconda
multidict                 6.0.2            py38h0a891b7_1    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
ncurses                   6.3                  h9c3ff4c_0    conda-forge
networkx                  2.7.1              pyhd8ed1ab_1    conda-forge
numpy                     1.19.5           py38h9894fe3_2    conda-forge
oauthlib                  3.2.0              pyhd8ed1ab_0    conda-forge
ont-fast5-api             4.0.2              pyhdfd78af_0    bioconda
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1n               h166bdaf_0    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.4.2            py38h47df419_0    conda-forge
parasail-python           1.2.4            py38h3b68952_2    bioconda
pbzip2                    1.1.13                        0    conda-forge
perl                      5.32.1          2_h7f98852_perl5    conda-forge
pigz                      2.6                  h27826a3_0    conda-forge
pillow                    9.1.0            py38h0ee0e06_0    conda-forge
pip                       22.0.4             pyhd8ed1ab_0    conda-forge
progressbar33             2.4                        py_0    conda-forge
protobuf                  3.20.0           py38hfa26641_3    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyfaidx                   0.6.4              pyh5e36f6f_0    bioconda
pyjwt                     2.3.0              pyhd8ed1ab_1    conda-forge
pyopenssl                 22.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.7              pyhd8ed1ab_0    conda-forge
pysam                     0.19.0           py38h8bf8b8d_0    bioconda
pysocks                   1.7.1            py38h578d9bd_5    conda-forge
pyspoa                    0.0.3            py38h8ded8fe_3    bioconda
python                    3.8.13          h582c2e5_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-edlib              1.3.9            py38h4a32c8e_1    bioconda
python-isal               0.11.1           py38h497a2fe_1    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytz                      2022.1             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
racon                     1.5.0                h7ff8a90_0    bioconda
readline                  8.1                  h46c0cb4_0    conda-forge
requests                  2.27.1             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.8                pyhd8ed1ab_0    conda-forge
samtools                  1.15                 h1170115_1    bioconda
scipy                     1.8.0            py38h56a6a73_1    conda-forge
setuptools                62.0.0           py38h578d9bd_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
sqlite                    3.37.1               h4ff8645_0    conda-forge
tar                       1.34                 ha1f6473_0    conda-forge
tensorboard               2.8.0              pyhd8ed1ab_1    conda-forge
tensorboard-data-server   0.6.0            py38h3e25421_1    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.2.0           mkl_py38h6d3daf0_0
tensorflow-base           2.2.0           mkl_py38h5059a2d_0
tensorflow-estimator      2.6.0            py38h709712a_0    conda-forge
termcolor                 1.1.0                      py_2    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
typing-extensions         4.1.1                hd8ed1ab_0    conda-forge
typing_extensions         4.1.1              pyha770c72_0    conda-forge
unicodedata2              14.0.0           py38h0a891b7_1    conda-forge
urllib3                   1.26.9             pyhd8ed1ab_0    conda-forge
werkzeug                  2.1.1              pyhd8ed1ab_0    conda-forge
whatshap                  1.3              py38h4a32c8e_1    bioconda
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
wrapt                     1.14.0           py38h0a891b7_1    conda-forge
xopen                     1.5.0            py38h578d9bd_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yarl                      1.7.2            py38h0a891b7_2    conda-forge
zipp                      3.8.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h166bdaf_1014    conda-forge
zstd                      1.5.2                ha95c52a_0    conda-forge
cjw85 commented 2 years ago

The code is failing to lookup the length of the sequence contig_1 from the provided input file. I would suggest that this is because the incorrect file has been provided, it must be the same file as originally provided as the reference sequence during the initial alignment step.

jpn2021 commented 2 years ago

Thank you. When running mini_align should the input reference fasta be a reference file or should it be from FLYE? Perhaps I had some confusion on when and how to use the assembly from FLYE. There is a known reference file for this genome.

cjw85 commented 2 years ago

If your aim is to assess variants in your sample compared to a known sequence then you should use the reference sequence throughout the whole process. You do not need to create an assembly with flye.

The medaka_haploid_variant helper script automates the process I think you are trying to accomplish.