pymzml / pymzML

pymzML - an interface between Python and mzML Mass spectrometry Files
https://pymzml.readthedocs.io/en/latest/
MIT License
162 stars 92 forks source link

Hotfix/obo encode issue and psi-ms.obo from 4.1.28- 4.1.33 #203

Closed ZhixuNi closed 4 years ago

ZhixuNi commented 4 years ago

obo encode issue:

UnicodeDecodeError observed under Windows 10 1909 64bit. The windows 10 VM is using EN-US as default, with DE-DE and ZH-CN installed. For some mzml files, following error messages will occur:

Traceback (most recent call last):
  File "C:\Users\winadmin\.conda\envs\envpyside2\lib\site-packages\pymzml\spec.py", line 220, in _get_encoding_parameters
    Acc=self.calling_instance.OT["32-bit float"]["id"],
  File "C:\Users\winadmin\.conda\envs\envpyside2\lib\site-packages\pymzml\obo.py", line 115, in __getitem__
    self.parseOBO()
  File "C:\Users\winadmin\.conda\envs\envpyside2\lib\site-packages\pymzml\obo.py", line 198, in parseOBO
    for line in obo:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 5771: illegal multibyte sequence

or any access to spectrum.mz can lead to freeze of the program.

Hot fix is applied by changing line 195 in obo.py from

with open_func(obo_file, "rt") as obo:

to

with open_func(obo_file, "rt", encoding='utf-8') as obo:

Problems solved and all tests passed.

Add latest psi-ms.obo files:

psi-ms-4.1.28.obo.gz
psi-ms-4.1.29.obo.gz
psi-ms-4.1.30.obo.gz
psi-ms-4.1.31.obo.gz
psi-ms-4.1.32.obo.gz
psi-ms-4.1.33.obo.gz

Now files in 2020 should be fine while using psi-ms-4.1.33.obo.gz releaseon on date: 21:01:2020

MKoesters commented 4 years ago

Hi,

thank you very much for reporting and fixing this issue! I'll wait for the CI/CD to finish and then merge it. I'll also try to upload the new version to pip as soon as possible

Best, Manuel

codecov[bot] commented 4 years ago

Codecov Report

Merging #203 into dev will increase coverage by 63.96%. The diff coverage is 84%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #203       +/-   ##
==========================================
+ Coverage   15.94%   79.9%   +63.96%     
==========================================
  Files          13      33       +20     
  Lines        1844    3951     +2107     
==========================================
+ Hits          294    3157     +2863     
+ Misses       1550     794      -756
Impacted Files Coverage Δ
tests/file_io_indexed_gzip_reader_test.py 96.55% <ø> (ø)
pymzml/file_classes/bytesMzml.py 40% <0%> (ø) :arrow_up:
pymzml/file_classes/indexedGzip.py 96.96% <100%> (+54.54%) :arrow_up:
tests/utils_decoder_test.py 66.66% <100%> (ø)
pymzml/regex_patterns.py 100% <100%> (ø) :arrow_up:
pymzml/file_classes/standardGzip.py 97.14% <100%> (+60%) :arrow_up:
tests/main_chromatogram_test.py 89.47% <100%> (ø)
tests/file_io_indexed_gzip_writer_test.py 99% <100%> (ø)
tests/test_file_paths.py 100% <100%> (ø)
tests/plot_spec_test.py 98.38% <100%> (ø)
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3bdfe8c...5f8af48. Read the comment docs.

ZhixuNi commented 4 years ago

Hi!

Thanks for the response! There are some problems to access the ndarrays in the pymzml.spec.Spectrum This only happens in some windows 10 machines with anaconda. Don't know if it is this UnicodeDecodeError issue or numpy issue. Currently it solved on our PCs with this fix, but still do not know if this fits to other people or not.

Best,

MKoesters commented 4 years ago

Could you tell me the numpy and anaconda versions which gave you that error? I'm not too familiar with anaconda, but I'll try to have a closer look as soon as I can

ZhixuNi commented 4 years ago

I'm using conda/4.7.12 requests/2.22.0 CPython/3.7.4 Windows/10 Windows/10.0.18362

The develop environment info (Ptyhon 3.7.6) is as follows:

# packages in environment at C:\Users\winadmin\.conda\envs\envpy37:
#
# Name                    Version                   Build  Channel
atomicwrites              1.3.0                    pypi_0    pypi
attrs                     19.3.0                   pypi_0    pypi
ca-certificates           2019.11.27                    0
certifi                   2019.11.28               py37_0
chardet                   3.0.4                    pypi_0    pypi
codecov                   2.0.15                   pypi_0    pypi
colorama                  0.4.3                    pypi_0    pypi
coverage                  5.0.3                    pypi_0    pypi
cycler                    0.10.0                   pypi_0    pypi
cython                    0.29.14                  pypi_0    pypi
et-xmlfile                1.0.1                    pypi_0    pypi
idna                      2.8                      pypi_0    pypi
importlib-metadata        1.4.0                    pypi_0    pypi
jdcal                     1.4.1                    pypi_0    pypi
kiwisolver                1.1.0                    pypi_0    pypi
llvmlite                  0.31.0                   pypi_0    pypi
matplotlib                3.1.2                    pypi_0    pypi
more-itertools            8.1.0                    pypi_0    pypi
natsort                   7.0.0                    pypi_0    pypi
numba                     0.47.0                   pypi_0    pypi
numpy                     1.18.1                   pypi_0    pypi
openpyxl                  3.0.3                    pypi_0    pypi
openssl                   1.1.1d               he774522_3
packaging                 20.0                     pypi_0    pypi
pandas                    0.25.3                   pypi_0    pypi
pip                       19.3.1                   py37_0
plotly                    4.4.1                    pypi_0    pypi
pluggy                    0.13.1                   pypi_0    pypi
py                        1.8.1                    pypi_0    pypi
pymzml                    2.4.5                    pypi_0    pypi
pyparsing                 2.4.6                    pypi_0    pypi
pyside2                   5.14.0                   pypi_0    pypi
pytest                    5.3.4                    pypi_0    pypi
pytest-cov                2.8.1                    pypi_0    pypi
python                    3.7.6                h60c2a47_2
python-dateutil           2.8.1                    pypi_0    pypi
pytz                      2019.3                   pypi_0    pypi
regex                     2020.1.8                 pypi_0    pypi
requests                  2.22.0                   pypi_0    pypi
retrying                  1.3.3                    pypi_0    pypi
scipy                     1.4.1                    pypi_0    pypi
setuptools                44.0.0                   py37_0
shiboken2                 5.14.0                   pypi_0    pypi
six                       1.14.0                   pypi_0    pypi
sqlite                    3.30.1               he774522_0
urllib3                   1.25.8                   pypi_0    pypi
vc                        14.1                 h0510ff6_4
vs2015_runtime            14.16.27012          hf0eaf9b_1
wcwidth                   0.1.8                    pypi_0    pypi
wheel                     0.33.6                   py37_0
wincertstore              0.2                      py37_0
xlrd                      1.2.0                    pypi_0    pypi
xlwt                      1.3.0                    pypi_0    pypi
zipp                      2.0.0                    pypi_0    pypi

I tried Numpy version 1.18.0 1.18.1 from conda and pip, the pip version 1.18.1 + this fix solved the issue.

Hope this info helps.