pynapple-org / pynapple

PYthon Neural Analysis Package :pineapple:
https://pynapple.org/
MIT License
278 stars 63 forks source link

Dealing differently with missing data and short intervals #352

Open vigji opened 2 weeks ago

vigji commented 2 weeks ago

I will describe my (potentially very niche) issue as it is easier for understanding the request:

I currently have a trace with strong artefacts from optogenetic stimulation. Opto stimulation consists in 40 Hz pulses, 10ms each, on for 6 seconds total. During spike sorting, I blanked all times where the laser is on (which means, only the 10 ms windows, not the whole 6 seconds trials). In this way, I can avoid the artefacts but I mess up my spike detection as I can't find correct spikes in the 10 ms window. This forces me to always compensate the bin size when computing firing rates during the trials

I wanted to implement a nice solution using the time_support property in pynapple. Basically What I would like to set only the laser_off times as a time_support so that a call to spike_times.count(bin_size=x) would gives me correct firing rate in the bins, ignoring the small 10ms epochs that are out of the time support.

Currently, however, this would not be supported: if I perform the count with an large bin size that would span multiple time intervals in and out of the time support (eg, 100 ms), they are all set to nan and interpolate over by count(). Example plot: (blue trace below, laser; orange trace: counts with bin_size=0.1; green trace: counts after restricting with suitable time_intervals)

Screenshot 2024-10-15 at 15 38 47

Would you be willing to support a change in the behavior of time_support that correct this? I am happy trying to implement it but not if it messes up other usages I am not seeing.

gviejo commented 2 weeks ago

I might understand what you mean. Can you describe the behavior of time_support you would need?

vigji commented 2 weeks ago

I think the issue is in pynapple.core._jitted_functions.jitcount. If I read it properly, the while loop allows only for bins that are entirely within the interval. I would have to change this behavior to keep all bins, and keep track of how much of the bin is outside the valid restricted interval, to normalize the count

BalzaniEdoardo commented 1 week ago

If I understood correctly, you want the count process with bins containing gaps in the support, to still return the number of of events, but normalised by the actual time support.
If so,

vigji commented 1 week ago

Agreed, this would be a rate more than a count. For the time series support, I would actually use the span (min_valid_t, max_valid_t, bin_size), or even allow for externally supplied t_min and t_max, to produce an array of now evenly spaced rate estimations. Bins that span a time interval completely out of the support would be nan. We could have some special handling of the behavior for bins with only a tiny time interval within the time support (maybe you want to specify something like: if < 1% of the bin is within time support, set it to nan)