Open emmalala123 opened 5 months ago
For reference, the QA step is optional and doesn't impact downstream processing.
But it is weird that it can't find spike trains. Could you check in your HDF5 if you have spike trains? You could use either vitables or hdfview. Or if you have h5dump you could try h5dump -n <path to file> | egrep spike_train
and check if you get an output.
If you don't have spike trains, try rerunning blech_make_arrays.py
and check for errors.
Ran it after blech_make_arrays.py
, and got following error
==============================
Running QA tests on Blech data
Directory: /media/cmazzio/storage/eb_ephys/EB18_behandephys_5_21_cue_align
Running Similarity test
Processing : /media/cmazzio/storage/eb_ephys/EB18_behandephys_5_21_cue_align/
==================
Similarity calculation starting
Similarity cutoff ::: 50
32it [00:06, 4.96it/s]
Similarity calculation complete, results being saved to file
==================
Running Drift test
Processing : /media/cmazzio/storage/eb_ephys/EB18_behandephys_5_21_cue_align/
/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order, subok=True)
Traceback (most recent call last):
File "utils/qa_utils/drift_check.py", line 122, in <module>
zscore_binned_spike_trains = [zscore(x, axis=-1) for x in plot_spike_trains]
File "utils/qa_utils/drift_check.py", line 122, in <listcomp>
zscore_binned_spike_trains = [zscore(x, axis=-1) for x in plot_spike_trains]
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/scipy/stats/_stats_py.py", line 2730, in zscore
return zmap(a, a, axis=axis, ddof=ddof, nan_policy=nan_policy)
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/scipy/stats/_stats_py.py", line 2876, in zmap
contains_nan, nan_policy = _contains_nan(a, nan_policy)
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/scipy/stats/_stats_py.py", line 97, in _contains_nan
contains_nan = np.isnan(np.sum(a))
File "<__array_function__ internals>", line 5, in sum
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2241, in sum
return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: operands could not be broadcast together with shapes (413,) (427,)
Finished QA tests
==============================
Thanks for pointing out these bugs. The QA tests need to be moved to after make_arrays in the flow-chart. And it seems like there's another issue with drift_check from your latest commit. I'll have a look.
Is there a definitive sketch of the flowchart, or an otherwise detailed walk-through? I might be able to figure out the nomnoml.com code to make a new one, but I'm not sure I'd have the right outline.
The flowchart was supposed to be definitive 😅 I've updated it to have QA after make arrays in the above branch. I'll go ahead and merge it. Please reopen the issue if the problem persists or if I missed something. Thank you.
I've just gotten back around to this part of the pipe, and I'm having the same problem as the 2nd issue @emmalala123 describes:
Running Drift test
Processing : /home/ramartin/Documents/MAR_Data/MR03/MR03_BAT_Tastes_Day6_240526_131433/
/home/ramartin/anaconda3/envs/blech_test/lib/python3.8/site-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order, subok=True)
Traceback (most recent call last):
File "utils/qa_utils/drift_check.py", line 122, in <module>
zscore_binned_spike_trains = [zscore(x, axis=-1) for x in plot_spike_trains]
File "utils/qa_utils/drift_check.py", line 122, in <listcomp>
zscore_binned_spike_trains = [zscore(x, axis=-1) for x in plot_spike_trains]
File "/home/ramartin/anaconda3/envs/blech_test/lib/python3.8/site-packages/scipy/stats/stats.py", line 2410, in zscore
contains_nan, nan_policy = _contains_nan(a, nan_policy)
File "/home/ramartin/anaconda3/envs/blech_test/lib/python3.8/site-packages/scipy/stats/stats.py", line 257, in _contains_nan
contains_nan = np.isnan(np.sum(a))
File "<__array_function__ internals>", line 5, in sum
File "/home/ramartin/anaconda3/envs/blech_test/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2241, in sum
return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
File "/home/ramartin/anaconda3/envs/blech_test/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: operands could not be broadcast together with shapes (210,) (630,)
So I'm reopening this, and I'll start trying to dig into that.
I think I'm starting to get the shape of the problem.
plot_spike_trains is a list, and each level of that list seems to be a tuple, and each tuple contains two (always? sometimes?) ndarrays. When python tries to run zscore(), it tries to combine the two arrays in order to zscore() them together, but they're (always? sometimes?) different sizes, so it fails.
For me, at least in the data set I'm handling, the length of plot_spike_trains is 18, which is the same as the # of saved units I have, so I'm guessing* that each position of plot_spike_trains contains a tuple that corresponds to 1 saved unit. Furthermore, at least in my data, all 18 of those tuples contain exactly 2 ndarrays. While I feel somewhat comfortable guessing that each tuple corresponds to a saved unit, I have no idea what the two ndarrays correspond to, or how I should be processing them.
My instinct is to sort of match the form of the source: plot_spike_trains is a list of tuples of arrays, so I'm inclined to break each tuple down into its component arrays, zscore each array individually, and then pack them back up into that 2x18 sort of structure, so that zscore_binned_spike_trains is a list of 18 tuples, each of which contains 2 ndarrays, each of which have been individually zscored.
But I'm really not confident that is the right answer. I think there are two essential questions that I have about the data though: 1: Is each tuple supposed to have 2 arrays in it, or is one of them an accident? 2: If there are supposed to be 2 arrays, do we want to zscore and store both of them back in zscore_binned_spike_trains, or only one of them? Like, I could imagine that the first array in each tuple is data of type A, and the second array is data of type B, and we only want data of type A to go into zscore_binned_spike_trains. But I have no idea.
Okay, I looked a little closer, and I think I've actually figured it out! The structure of plot_spike_trains is: A list of length [# of saved units], where each element of the list is a tuple of length [# of stimuli], and each element of the tuple is an array that contains spike train data for one stimulus for one saved unit.
Now that I think I understand the data, at least somewhat, I think I can move forward on fixing the issue.
Even further characterization of the problem: at least in my case, part of the problem is that I don't have an equal number of stimulus presentations; one of the arrays is 210 (30 trials x 7 samples per trial post binning), while the other is 630 (90 trials x 7 samples per trial post binning). I'm guessing the error @emmalala123 got had exactly the same root; 59 trails of 1 stimulus vs 61 trials of the other, giving 413 and 427 bins.
This explains why we're the only ones who've run into it; for IOC data, it makes sense that every stimulus would (usually) have the same # of trials, though I can image scenarios where that might not be the case. In behavioral data, animals can skip trials, so inevitably there will be uneven #s of stimulus presentations.
So now I'm 100% sure I know what the problem is, and I'm not sure what the most appropriate solution is, in terms of reconciling the nature of the data with the intended behavior of the analysis.
Since QA testing is not "necessary" for pipeline function, I'm going to first focus on #82 so it's easier to troubleshoot uneven trial related issues in the future. Right now, I'm going to fish for a dataset with uneven trials (I think I have one, but if I can't find it, I'll ask you for yours). Once that is in place, I can work through this error. Thanks for figuring out where the problem is!
I did also figure out a fix for this particular problem. I think there's probably a lot of different solutions, but I just had it pad out uneven trail numbers with NA values to square the arrays up for plotting, etc. See: #207
Not sure if this is normal or a me thing:
Running Drift test Processing : /media/cmazzio/storage/eb_ephys/EB18_behandephys_5_21_cue_align/ Traceback (most recent call last): File "utils/qa_utils/drift_check.py", line 92, in
spike_trains = get_spike_trains(metadata_handler.hdf5_name)
File "utils/qa_utils/drift_check.py", line 42, in get_spike_trains
dig_ins = hf5.list_nodes('/spike_trains')
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/tables/file.py", line 1962, in list_nodes
group = self.get_node(where) # Does the parent exist?
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/tables/file.py", line 1607, in get_node
node = self._get_node(nodepath)
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/tables/file.py", line 1556, in _get_node
node = self._node_manager.get_node(nodepath)
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/tables/file.py", line 417, in get_node
node = self.node_factory(key)
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/tables/group.py", line 1137, in _g_load_child
node_type = self._g_check_has_child(childname)
File "/home/cmazzio/miniconda3/envs/blech_clust/lib/python3.8/site-packages/tables/group.py", line 375, in _g_check_has_child
raise NoSuchNodeError(
tables.exceptions.NoSuchNodeError: group
/
does not have a child named/spike_trains