pymzml / pymzML

pymzML - an interface between Python and mzML Mass spectrometry Files
https://pymzml.readthedocs.io/en/latest/
MIT License
160 stars 91 forks source link

Support for VITEK MS (aka. Shimadzu Assurance) #216

Closed frederic-foucault closed 4 years ago

frederic-foucault commented 4 years ago

Dear all,

i'm having problems to read SOME not all mzml files.

I have python version 3.7.4installed on osx catalina I installed pymzml version 2.4.6


for my_file in os.listdir(data_in):
    if my_file.endswith('mzml'):
        # parse mzml
        run = pymzml.run.Reader(os.path.join(data_in,my_file))

The error is:

Traceback (most recent call last):
  File "/usr/src/app/src/mzml converter v0.001.py", line 34, in <module>
    run = pymzml.run.Reader(os.path.join(data_in,my_file))
  File "/usr/local/lib/python3.7/dist-packages/pymzml/run.py", line 116, in __init__
    self.info["file_object"] = self._open_file(path_or_file)
  File "/usr/local/lib/python3.7/dist-packages/pymzml/run.py", line 218, in _open_file
    build_index_from_scratch=self.build_index_from_scratch,
  File "/usr/local/lib/python3.7/dist-packages/pymzml/file_interface.py", line 27, in __init__
    self.file_handler = self._open(path)
  File "/usr/local/lib/python3.7/dist-packages/pymzml/file_interface.py", line 58, in _open
    path_or_file, self.encoding, self.build_index_from_scratch
  File "/usr/local/lib/python3.7/dist-packages/pymzml/file_classes/standardMzml.py", line 59, in __init__
    self.seek_list = self._read_extremes()
  File "/usr/local/lib/python3.7/dist-packages/pymzml/file_classes/standardMzml.py", line 669, in _read_extremes
    re.search(b"[0-9]*$", id_match.group("id")).group()
ValueError: invalid literal for int() with base 10: b''

Thank you for your help Best regards Frederic Foucault

MKoesters commented 4 years ago

Hi Frederic, thanks for reporting this Issue. Would it be possible to upload your mzML file? Also, is there anything different about the mzML files which won't load, e.g. the runs were measured on a different machine or something like that?

If you cant upload your file, could you open your mzML file and look for a line similar to this <spectrum index="0" id="controllerType=0 controllerNumber=1 scan=1" defaultArrayLength="917"> and tell me the content of the id part or just give me a sample of how your <spectrum> tags look like?

Best, Manuel

frederic-foucault commented 4 years ago

Hi Manuel,

here is the line i found in the mzml fle: spectrum index="0" id="v1v6232d" defaultArrayLength="117"

I need more informations to be able to find the content of the id or a sample. or tell me where i can upload the mzml file. Best Frederic

MKoesters commented 4 years ago

What you posted is enough for me, I can see why its not working. Just out of curiosity, can you also send me the same line fom a file that is working? And do you know, on what instrument the run was measure and which conversion software you used?

I'll report back to you as soon as I have fix for your problem

frederic-foucault commented 4 years ago

Hello Manuel

Attached is the mzml file

Thank you very much for your help

Best Frederic

From: Manuel notifications@github.com Reply to: pymzml/pymzML reply@reply.github.com Date: Monday, 30 March 2020 at 16:57 To: pymzml/pymzML pymzML@noreply.github.com Cc: Foucault Frederic frederic.foucault@mabritec.com, Author author@noreply.github.com Subject: Re: [pymzml/pymzML] Error run.reader ValueError: invalid literal for int() with base 10: b'' (#216)

Hi Frederic, thanks for reporting this Issue. Would it be possible to upload your mzML file? Also, is there anything different about the mzML files which won't load, e.g. the runs were measured on a different machine or something like that?

Best, Manuel

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pymzml/pymzML/issues/216#issuecomment-606050645, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCRFF3T7QGLW47W6FU2QMTRKCXLTANCNFSM4LWUYBVQ.

frederic-foucault commented 4 years ago

Hello Manuel

it was a VITEK MS (aka. Shimadzu Assurance). Software used is the Biomerieux VITEK MS Software. It contains masslists and processed data.

Best Frederic

MKoesters commented 4 years ago

Hi Frederic,

I started to implement a fix for your problem, could you install pymzml using: pip install "git+https://github.com/pymzml/pymzml.git@fix_216" and tell me if this works for you?

frederic-foucault commented 4 years ago

Hello Manuel

I run the command and get Same error as before ☹. Here is the output of the installation of the pymzml fix. I’m not sure that it was really installed on top of the pymzml 2.4.6……?

Best frederic [A screenshot of a cell phone Description automatically generated]

From: Manuel notifications@github.com Reply to: pymzml/pymzML reply@reply.github.com Date: Tuesday, 31 March 2020 at 12:29 To: pymzml/pymzML pymzML@noreply.github.com Cc: Foucault Frederic frederic.foucault@mabritec.com, Author author@noreply.github.com Subject: Re: [pymzml/pymzML] Error run.reader ValueError: invalid literal for int() with base 10: b'' (#216)

Hi Frederic,

I started to implement a fix for your problem, could you install pymzml using: pip install "git+https://github.com/pymzml/pymzml.git@fix_216" and tell me if this works for you?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pymzml/pymzML/issues/216#issuecomment-606538870, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCRFFZITZJCBZV5BBQXQL3RKHAZPANCNFSM4LWUYBVQ.

MKoesters commented 4 years ago

Hmm, okay, I tried to create a similar mzML as you have, maybe I missed something ... So, attaching files to your mails when answering here does not work it seems, would you be able to upload the mzML either to gdrive and share the link with me or use a file hoster like https://wetransfer.com/?

Also, are installaing pymzML globally or a you using virtual environments (https://docs.python.org/3/tutorial/venv.html)? If you are using virtualenvironments, could you test the command above with a fresh environment?

frederic-foucault commented 4 years ago

Please give me an email address because this : Manuel notifications@github.commailto:notifications@github.com does not seems to work in wetransfer

From: Manuel notifications@github.com Reply to: pymzml/pymzML reply@reply.github.com Date: Tuesday, 31 March 2020 at 15:45 To: pymzml/pymzML pymzML@noreply.github.com Cc: Foucault Frederic frederic.foucault@mabritec.com, Author author@noreply.github.com Subject: Re: [pymzml/pymzML] Error run.reader ValueError: invalid literal for int() with base 10: b'' (#216)

Hmm, okay, I tried to create a similar mzML as you have, maybe I missed something ... So, attaching files to your mails when answering here does not work it seems, would you be able to upload the mzML either to gdrive and share the link with me or use a file hoster like https://wetransfer.com/?

Also, are installaing pymzML globally or a you using virtual environments (https://docs.python.org/3/tutorial/venv.html)? If you are using virtualenvironments, could you test the command above with a fresh environment?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pymzml/pymzML/issues/216#issuecomment-606636799, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCRFFZV5V46UGHEFUZTYTLRKHXWDANCNFSM4LWUYBVQ.

frederic-foucault commented 4 years ago

Concerning installation of pymzml i have created a virtual environment using anaconda. This virtual environment is having python version 3.6

From: Foucault Frederic frederic.foucault@mabritec.com Date: Tuesday, 31 March 2020 at 15:59 To: pymzml/pymzML reply@reply.github.com, pymzml/pymzML pymzML@noreply.github.com Cc: Author author@noreply.github.com Subject: Re: [pymzml/pymzML] Error run.reader ValueError: invalid literal for int() with base 10: b'' (#216)

Please give me an email address because this : Manuel notifications@github.commailto:notifications@github.com does not seems to work in wetransfer

From: Manuel notifications@github.com Reply to: pymzml/pymzML reply@reply.github.com Date: Tuesday, 31 March 2020 at 15:45 To: pymzml/pymzML pymzML@noreply.github.com Cc: Foucault Frederic frederic.foucault@mabritec.com, Author author@noreply.github.com Subject: Re: [pymzml/pymzML] Error run.reader ValueError: invalid literal for int() with base 10: b'' (#216)

Hmm, okay, I tried to create a similar mzML as you have, maybe I missed something ... So, attaching files to your mails when answering here does not work it seems, would you be able to upload the mzML either to gdrive and share the link with me or use a file hoster like https://wetransfer.com/?

Also, are installaing pymzML globally or a you using virtual environments (https://docs.python.org/3/tutorial/venv.html)? If you are using virtualenvironments, could you test the command above with a fresh environment?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pymzml/pymzML/issues/216#issuecomment-606636799, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCRFFZV5V46UGHEFUZTYTLRKHXWDANCNFSM4LWUYBVQ.

MKoesters commented 4 years ago

you can send the file to manuel.koesters@dcb.unibe.ch Sry for the inconvenience, I never worked with files from the instrument you are working with and thus never tested pymzML with these kinds of files

if you still want to check if you installed the correc version:

$ pip install "git+https://github.com/pymzml/pymzml.git@fix_216"
$ python
>>> import pymzml
>>> pymzml.__version__

output should be '2.4.7a1'

MKoesters commented 4 years ago

Thanks, I received and tested your file. However, I can open it with the fixed version and acces the spectra including mz and intensity values Did you test the code above and can verify that the output is as expected? I also created an fresh virtualenvironment (I'm not using conda though) and run the following code

$ pip install "git+https://github.com/pymzml/pymzml.git@fix_216"
$ ipython
In [1]: import pymzml                                                                        
In [2]: run = pymzml.run.Reader("/home/manuel/Downloads/1585556427714_19-03-28_Triemli_v1v6232d_1_DS184019882_K4_6393_1.mzml", build_index_from_scratch=True)                                  
[ Warning ] Found 1 spectra and 0 chromatograms
[ Warning ] However Spectrum index list shows 0 and Chromatogram index list shows 0 entries
[ Warning ] Updating offset dict with found offsets but some might be still missing
[ Warning ] This may happen because your is file truncated

In [3]: s = run['v1v6232d']                                                                                                                                                                    

In [4]: s.peaks('raw') 

which worked for me

frederic-foucault commented 4 years ago

Hello Manuel

It works for me too (created a venv , install the updated version of pymzml , read the mzml file).

Thank you very much Manuel. I may have a few more questions later on. Best Frederic Foucault

From: Manuel notifications@github.com Reply to: pymzml/pymzML reply@reply.github.com Date: Tuesday, 31 March 2020 at 17:29 To: pymzml/pymzML pymzML@noreply.github.com Cc: Foucault Frederic frederic.foucault@mabritec.com, Author author@noreply.github.com Subject: Re: [pymzml/pymzML] Error run.reader ValueError: invalid literal for int() with base 10: b'' (#216)

Thanks, I received and tested your file. However, I can open it with the fixed version and acces the spectra including mz and intensity values Did you test the code above and can verify that the output is as expected? I also created an fresh virtualenvironment (I'm not using conda though) and run the following code

$ pip install "git+https://github.com/pymzml/pymzml.git@fix_216"

$ ipython

In [1]: import pymzml

In [2]: run = pymzml.run.Reader("/home/manuel/Downloads/1585556427714_19-03-28_Triemli_v1v6232d_1_DS184019882_K4_6393_1.mzml", build_index_from_scratch=True)

[ Warning ] Found 1 spectra and 0 chromatograms

[ Warning ] However Spectrum index list shows 0 and Chromatogram index list shows 0 entries

[ Warning ] Updating offset dict with found offsets but some might be still missing

[ Warning ] This may happen because your is file truncated

In [3]: s = run['v1v6232d']

In [4]: s.peaks('raw')

which worked for me

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pymzml/pymzML/issues/216#issuecomment-606698969, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCRFFZ6H6KMGNS2PAMNTKDRKID3ZANCNFSM4LWUYBVQ.

MKoesters commented 4 years ago

Great to hear that! I'll do some further testing and then merge this branch to dev and upload a version to pypi When this happens, I'll let you know

Btw, I also implemented a property for you to access the signal_to_noise array run['v1v6232d'].signal_to_noise

If you have any further questions, either open a new issue or just write me a mail

frederic-foucault commented 4 years ago

Thank you very much Manuel. This is great. I really appreciate you give us access to the signal to noise ratio on top of it. I knew that theses mzml files were not raw but processed data. Could you tell me what was specific to these files compared to others ? I’m look forward installing this new release…

Best Frederic Foucault

From: Manuel notifications@github.com Reply to: pymzml/pymzML reply@reply.github.com Date: Tuesday, 31 March 2020 at 18:31 To: pymzml/pymzML pymzML@noreply.github.com Cc: Foucault Frederic frederic.foucault@mabritec.com, Author author@noreply.github.com Subject: Re: [pymzml/pymzML] Error run.reader ValueError: invalid literal for int() with base 10: b'' (#216)

Great to hear that! I'll do some further testing and then merge this branch to dev and upload a version to pypi When this happens, I'll let you know

Btw, I also implemented a property for you to access the signal_to_noise array run['v1v6232d'].signal_to_noise

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pymzml/pymzML/issues/216#issuecomment-606736250, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCRFFZPNG6LVOMMJIVGOGLRKILFJANCNFSM4LWUYBVQ.

frederic-foucault commented 4 years ago

Thank you manuel !

From: Manuel notifications@github.com Reply to: pymzml/pymzML reply@reply.github.com Date: Tuesday, 26 May 2020 at 14:05 To: pymzml/pymzML pymzML@noreply.github.com Cc: Foucault Frederic frederic.foucault@mabritec.com, Author author@noreply.github.com Subject: Re: [pymzml/pymzML] Support for VITEK MS (aka. Shimadzu Assurance) (#216)

Closed #216https://github.com/pymzml/pymzML/issues/216 via #224https://github.com/pymzml/pymzML/pull/224.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pymzml/pymzML/issues/216#event-3373226240, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCRFF2HR6DFIYCXGYDSPT3RTOWB7ANCNFSM4LWUYBVQ.