kushalkolar / MESmerize

Platform for Calcium Imaging analysis. DEPRECATED.
GNU General Public License v3.0
58 stars 14 forks source link

KShape bug #61

Closed pr4deepr closed 2 years ago

pr4deepr commented 3 years ago

Describe the bug I am using k-Shape Clustering to determine if I can find clusters in my traces based on the shapes of the peak. I get the following error when I click Start.

D:\Anaconda3\envs\mesmerize\lib\site-packages\sklearn\utils\deprecation.py:143: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.
  warnings.warn(message, FutureWarning)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "D:\Anaconda3\envs\mesmerize\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "D:\Anaconda3\envs\mesmerize\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "D:\Anaconda3\envs\mesmerize\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "d:\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\plotting\widgets\kshape\kshape_process.py", line 16, in <module>
    from mesmerize.common.configuration import HAS_TSLEARN, get_sys_config
  File "D:\Anaconda3\envs\mesmerize\lib\site-packages\mesmerize\__init__.py", line 1, in <module>
    from .analysis import *
  File "D:\Anaconda3\envs\mesmerize\lib\site-packages\mesmerize\analysis\__init__.py", line 3, in <module>
    from .math import cross_correlation, drfft_dtw, tvregdiff
  File "D:\Anaconda3\envs\mesmerize\lib\site-packages\mesmerize\analysis\math\drfft_dtw.py", line 17, in <module>
    raw_curve: np.ndarray = None, rf_curve: np.ndarray = None) -> list:
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\context.py", line 56, in Manager
    m.start()
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\managers.py", line 513, in start
    self._process.start()
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "D:\Anaconda3\envs\mesmerize\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

===== 2021.07.21 10:07:11 =====

I followed your video to perform Peak detection and got to the point where I got peak features. I can successfully view different peak parameters using the BeeSwarm Plot. When I go back to the flowchart and connect K-shape to the Peak Detect node, and try the clustering with default options I get the error above. This happens for any peak parameter I select in the data column.

image

On a related note for K-shape, when I try _pf_peak_curve in the data column, I get this error:

Traceback (most recent call last):
  File "d:\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\common\qdialogs.py", line 52, in fn
    return func(self, *args, **kwargs)
  File "d:\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\plotting\widgets\kshape\widget.py", line 713, in start_process
    padded = self.pad_input_data(self.input_arrays, method='fill-size')
  File "d:\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\plotting\widgets\kshape\widget.py", line 664, in pad_input_data
    s = c.size
AttributeError: 'float' object has no attribute 'size'

Operating System & specs (CPU, RAM etc.). Please complete the following information:

Details about your Mesmerize install

Thanks for developing and supporting Mesmerize.

Cheers Pradeep

kushalkolar commented 3 years ago

@pr4deepr thanks for providing details!

The freeze issue has to do with the peculiarity of windows w.r.t. forking processes, I have to figure out which module(s) to protect with a if __name__ == '__main__' to solve this, if you know what I'm talking about you can try protectingmesmerize\analysis\math\drfft_dtw.pywith anif name == __main__ . I'm currently travelling so I could try in a few days.

I suspect that your other issue ('float' object has no attribute 'size') is because some of your peak curves are NaNs for some reason. If you open the peak editor GUI make sure that each peak is flanked by two bases on either side. The DropNa node will let you drop NaNs: http://docs.mesmerizelab.org/en/master/user_guides/flowchart/nodes.html#dropna

I think that if you choose set axis as _pf_peak_curve and how as any it should remove the NaNs from the _pf_peak_curve data. Play around with the settings because I don't remember exactly.

For the installation, yea pandas, h5py, numpy and tensorflow are in a bit of a compatibility mess at the moment w.r.t. verisons.

mamba might help reduce the installation time, see: https://github.com/kushalkolar/MESmerize/issues/53#issuecomment-845247138

pr4deepr commented 3 years ago

Thanks heaps for the feedback.

Happy Travels. Glad you can atleast travel, don't think we can do that anytime soon.. !! 😷

pr4deepr commented 3 years ago

So, I went to mesmerize\analysis\ init.py and added this part

if __name__ == '__main__':
    from .math import drfft_dtw

It seemed to fix the issue.

** One of my peaks didn't have a base. Is there an easy way to detect which cell/trace has it? In the console, I Saw this line:

\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\analysis\compute_peak_features.py:106: UserWarning: Peak at curve index: <0> is not flanked by bases on both sides, ignoring
  warn(f"Peak at curve index: <{p_ix}> is not flanked by bases on both sides, ignoring")

The dropna node suggestion worked well. **

kushalkolar commented 3 years ago

I'd try protecting the drfft_dtw module instead, if it works you can make a PR :) . Changing imports in an __init__.py like that is weird and could cause other issues.

For the peak missing the bases, I think that the progress bar's index will denote the curve that it stopped at and the warning is the index location of the peak within the curve that is denoted by the progress bar.

Glad that DropNa worked for you.

On Wed, Jul 21, 2021 at 8:48 PM Pradeep Rajasekhar @.***> wrote:

So, I went to mesmerize\analysisinit.py and added this part

if name == 'main': from .math import drfft_dtw

It seemed to fix the issue.

** One of my peaks didn't have a base. Is there an easy way to detect which cell/trace has it? In the console, I Saw this line:

\anaconda3\envs\mesmerize\lib\site-packages\mesmerize\analysis\compute_peak_features.py:106: UserWarning: Peak at curve index: <0> is not flanked by bases on both sides, ignoring warn(f"Peak at curve index: <{p_ix}> is not flanked by bases on both sides, ignoring")

The dropna node suggestion worked well. **

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/kushalkolar/MESmerize/issues/61#issuecomment-884588933, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHXXRF5MLQZTRLJ7NMQNHTTY5TFXANCNFSM5AW5ILEA .

pr4deepr commented 3 years ago

Actually, scrap that previous comment, that doesn't make sense..

pr4deepr commented 3 years ago

So, Is this good practice? Wrap the drfft_dtw code in a main() function and then add this at the end of the module??

if __name__ == '__main__':
    main()
kushalkolar commented 3 years ago

I just took a look at that module and I think it'd be better placed in the mesmerize_manuscripts_repo than within mesmerize since it's doesn't exactly fit anywhere with mesmerize itself. I'll reorganize it in a few days and maybe make a new release, but for now you should be able to just safely remove that module's import from mesmerize.analysis.__init__

pr4deepr commented 3 years ago

Thanks for that. Also, is there updated documentation or video available for KShapes with the new options? Thanks for all the tutorial videos you posted, they were extremely helpful! If it helps, I attended your I2K tutorial last year which is how I got onto using MESmerize..

kushalkolar commented 3 years ago

@pr4deepr I haven't update the kshape docs with the gridsearch options. Here's an overview from an internal email:

I added a gridsearch feature to the kshape clustering GUI in mesmerize. It allows you to select a partition range, npart rng, & a combination number, ncombs.

npart rng is a range of partition values for the “search space”. In each iteration of the gridsearch, it will sort the data (by either peak width or amplitude, see the sortby param), and randomly select cluster seeds from each partition.

ncombs is the number of random cluster seed combinations to try for each partition value.

Note that the ncombs is 10^, so the default value of 2 will do 100 combinations.

Unlike single kShape, the gridsearch is multithreaded so it will simultaneously perform npartitions * ncombs kshape-iterations, as per the number of threads that you’ve set in your system configuration.

When it’s done you’ll get a heatmap like this (but bigger). n_clusters (i.e. npartitions) are along the y-axis (labels are color coded), and seeded combinations are along the x-axis. The heatmap visualizes the inertia from each k-Shape model. The inertia is within cluster sum of squares, so the smaller the inertia value the tighter the clusters are. You can click on the squares in the heatmap to visualize the model in the rest of the kshape GUI.

Note: If you close the heatmap you’ll have to call this.kga_inertia_heatmap.show() in the console to get it back, haven’t made a button yet.

You shouldn't necessarily pick the model with the lowest inertia, but it will help narrow down on a suitable model. Some models with very low inertia might have empty cluster(s) which skews the inertia value and these models should be avoided. => I think that current state of the heatmap is that it will indicate models that have empty clusters with a specific color so you can easily avoid them (I think it makes them white?).

You could google "gridsearch parameters" to learn about what a gridsearch is.

Glad to hear you found the tutorials and I2K workshop helpful! :)

pr4deepr commented 3 years ago

Thanks for this detailed comment. I'll be testing it out and will post here or in gitter chat if anything is unclear!!

kushalkolar commented 2 years ago

Closing due to inactivity.