michalsta / opentims

Open-source C++ and Python module for opening binary timsTOF data files.
Other
41 stars 11 forks source link

for loop over D.query_iter(D.ms1_frames, columns=('intensity',)) crashed #7

Open animesh opened 2 years ago

animesh commented 2 years ago

I am running it over a window2016server machine with 128GB RAM where the following works

path = Path(pathFiles)
D = OpenTIMS(path) # get data handle
print(pathFiles,D) #print(len(D)) # The number of peaks.
OpenTIMS(1634400182)
pprint(D.query(frames=[1], columns=all_columns))
{'frame': array([1, 1, 1, ..., 1, 1, 1], dtype=uint32),
'intensity': array([21, 54, 54, ..., 22, 61, 81], dtype=uint32),
'inv_ion_mobility': array([1.59926563, 1.59485905, 1.59045181, ..., 0.60113621, 0.60113621,
       0.60113621]),
'mz': array([1377.64799752, 1436.23477832, 1613.74566582, ...,  515.79462555,
        629.21372085,  801.32747883]),
'retention_time': array([0.456886, 0.456886, 0.456886, ..., 0.456886, 0.456886, 0.456886]),
'scan': array([ 35,  39,  43, ..., 926, 926, 926], dtype=uint32),
'tof': array([343878, 353782, 382616, ..., 161198, 191291, 232170], dtype=uint32)}

but following crashes after few loops

for fr in D.query_iter(D.ms1_frames, columns=('intensity',)):
    print(fr['intensity'])

is there a way to know what is the underlying issue?

MatteoLacki commented 2 years ago

Hello,

Not if you don't supply error messages :) And the version of opentims!

Best,

Mateusz Krzysztof Łącki

German tel. +49 159 01681376 Polish tel. +48 579 647 311 Skype: mathewin GitHub: MatteoLacki https://github.com/MatteoLacki

On Fri, Sep 3, 2021 at 4:26 PM Ani @.***> wrote:

I am running it over a window2016server machine with 128GB RAM where the following works

path = Path(pathFiles) D = OpenTIMS(path) # get data handle print(pathFiles,D) #print(len(D)) # The number of peaks. OpenTIMS(1634400182) pprint(D.query(frames=[1], columns=all_columns)) {'frame': array([1, 1, 1, ..., 1, 1, 1], dtype=uint32), 'intensity': array([21, 54, 54, ..., 22, 61, 81], dtype=uint32), 'inv_ion_mobility': array([1.59926563, 1.59485905, 1.59045181, ..., 0.60113621, 0.60113621, 0.60113621]), 'mz': array([1377.64799752, 1436.23477832, 1613.74566582, ..., 515.79462555, 629.21372085, 801.32747883]), 'retention_time': array([0.456886, 0.456886, 0.456886, ..., 0.456886, 0.456886, 0.456886]), 'scan': array([ 35, 39, 43, ..., 926, 926, 926], dtype=uint32), 'tof': array([343878, 353782, 382616, ..., 161198, 191291, 232170], dtype=uint32)}

but following crashes after few loops

for fr in D.query_iter(D.ms1_frames, columns=('intensity',)): print(fr['intensity'])

is there a way to know what is the underlying issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/michalsta/opentims/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6H2AEB6FWDAOPZQEPBOVDUADLH5ANCNFSM5DL3X67Q .

liquidcarbon commented 2 years ago

The issue could be rooted here:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_28411/1257180327.py in <module>
----> 1 D.query(frames=73, columns=('intensity'))

~ ... python3.9/site-packages/opentimspy/opentims.py in query(self, frames, columns)
    140             dict: columns to numpy array mapping.
    141         """
--> 142         assert all(c in self.all_columns for c in columns), f"Accepted column names: {self.all_columns}"
    143 
    144         try:

AssertionError: Accepted column names: ('frame', 'scan', 'tof', 'intensity', 'mz', 'inv_ion_mobility', 'retention_time')

Using columns=('intensity',) should've worked, but using ('intensity') (without the comma) will fail. Perhaps this finickity could be addressed

alima82 commented 2 years ago

This code is fake! I can't plot_intensity_given_mz_inv_ion_mobility. it does NOT work at all. I wrote J prot Res to inform they have been publishing unfinished/fake bioinformatics work.

MatteoLacki commented 2 years ago

While you at it, please write the Pope that they have some really bad Catholics.

MatteoLacki commented 2 years ago

The issue could be rooted here:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_28411/1257180327.py in <module>
----> 1 D.query(frames=73, columns=('intensity'))

~ ... python3.9/site-packages/opentimspy/opentims.py in query(self, frames, columns)
    140             dict: columns to numpy array mapping.
    141         """
--> 142         assert all(c in self.all_columns for c in columns), f"Accepted column names: {self.all_columns}"
    143 
    144         try:

AssertionError: Accepted column names: ('frame', 'scan', 'tof', 'intensity', 'mz', 'inv_ion_mobility', 'retention_time')

Using columns=('intensity',) should've worked, but using ('intensity') (without the comma) will fail. Perhaps this finickity could be addressed

Hi, this is now fixed on the main branch of OpenTims. We will push it to pypi once we have a critical mass of changes and more tests. Sorry that took so much time, but I have a small baby and simply could not make anything recently.

Best wishes,

MatteoLacki commented 2 years ago
matteo@pinguin:~/Projects/MIDIA$ make py
veMIDIA/bin/ipython
Python 3.8.10 (default, Mar 15 2022, 12:22:08) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.3.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import opentimspy
   ...: import timspy
   ...: 
   ...: path = "data/raw/G2111/G211125_007_Slot1-1_1_3264.d"
   ...: ot = opentimspy.OpenTIMS(path)
   ...: 
   ...: list(ot.query_iter(frames=range(1,10), columns='intensity'))
Out[1]: 
[{'intensity': array([ 93, 122,  96, ...,  73, 107,  72], dtype=uint32)},
 {'intensity': array([59, 36, 23, ..., 10, 10, 10], dtype=uint32)},
 {'intensity': array([90, 33, 82, ..., 10, 10, 10], dtype=uint32)},
 {'intensity': array([28, 63, 10, ..., 10, 10, 10], dtype=uint32)},
 {'intensity': array([74, 50, 10, ..., 10, 10, 10], dtype=uint32)},
 {'intensity': array([37, 33, 50, ..., 10, 10, 10], dtype=uint32)},
 {'intensity': array([68, 47, 56, ..., 10, 10, 10], dtype=uint32)},
 {'intensity': array([20, 10, 79, ..., 10, 10, 10], dtype=uint32)},
 {'intensity': array([61, 57, 72, ..., 10, 11, 11], dtype=uint32)}]

Works on my machine.

MatteoLacki commented 2 years ago

@michalsta I will close the issue once we will push it on pypi.

MatteoLacki commented 2 years ago

Wait, @liquidcarbon this cannot be the problem: @animesh was using ("intensity",) in the first place, and that always worked and actually still works for me. @animesh We need more info on that from your side. I have been using code with only intensity like for few hundred projects for which I had to get the intensity distribution and it worked well on linux. Please make some reproducible example together with data. Best, if it was some sort of linux container. And maybe send in some data to check.