Closed timonmerk closed 5 months ago
No problem at all, I'll get to it after lunch, the tests/examples were basically the next thing in line, I think I should be able to finish the bulk of it today
Hey @toni-neurosc, thanks so much for those changes. I saw you now already added some burst addition in this PR. Sorry for messing with that.
I now did a merge that resulted with conflicts, but I wanted to push my additions in nm_bursts
as well to this branch.
There are some points I want to add. I looked at all the points that you marked with "TONI". There is one test currently failing: test_osc_features:test_fft_wrong_logtransform_param_init
.
Is there in Pydantic a method to also check after initialization that fields will be validated? I thought that model_validator(mode="after")
would relate to that point. And actually you set this as a decorator in NMSettings.validate_settings
, so I am not sure why this one doesn't catch it...
Regarding your comment about zero divisions, there is actually a test that checks for all values being zero or NaN in test_nan_values.py
. So in this case there might be warnings, especially when zero values are log-transformed (e.g. in nm_oscillatory) which results in -np.inf. But this should be expected behavior.
There was also a bug (I think) in NMSettings.enable_all_features
, I changed it now that all features are enabled and not disabled.
Apart from that, the changes are great. I would also start adapting the docstrings now that they match the params for each function.
Hello Timon, thanks for taking the time to review, it's a huge change and the more eyes we have on this the better. I'll answer all your questions, but I think most issues are already solved.
I now did a merge that resulted with conflicts, but I wanted to push my additions in
nm_bursts
as well to this branch.
Fixed! It was just a merge that kept both old and new code
There are some points I want to add. I looked at all the points that you marked with "TONI". There is one test currently failing:
test_osc_features:test_fft_wrong_logtransform_param_init
. Is there in Pydantic a method to also check after initialization that fields will be validated? I thought thatmodel_validator(mode="after")
would relate to that point. And actually you set this as a decorator inNMSettings.validate_settings
, so I am not sure why this one doesn't catch it...
Good catch, I had some trouble with that too and I was experimenting on my local a bunch of possibilities, I'll explain so you can weigh in:
set_fast_compute()
) and then modify some specific values, expecting it to fail. nm_oscillatory.FFT(settings, ch_names, sfreq)
because I sneaked in a settings.validate()
call in the OscillatoryFeature
base class (among others).validate_assignment=True
for the NMBaseModel base class. I discarded this solution because it made it very inconvenient to modify the settings in a step-wise manner (e.g. if you disable all the features with the intent of re-enabling just one, it would throw an error as soon as you disable all because the model validator expects at least 1 feature to be enabled). So I thought this was a potential source of headaches for the end-user and did not do it. What do you think? I think that we should be safe by adding settings.validate()
in all NMFeature constructors and in Stream initialization. In fact, the stream alone should be enough but since this is not a performance-sensitive part of the code better safe than sorry I guess.
Regarding your comment about zero divisions, there is actually a test that checks for all values being zero or NaN in
test_nan_values.py
. So in this case there might be warnings, especially when zero values are log-transformed (e.g. in nm_oscillatory) which results in -np.inf. But this should be expected behavior.
The warnings I was referring to were not in the tests but appear during normalization when I run the software on randomly generated data. I think it results from 0/0 divisions, maybe it's expected behaviors but idk.
There was also a bug (I think) in
NMSettings.enable_all_features
, I changed it now that all features are enabled and not disabled.
Yes, that was totally a bug on my side, I also fixed it on my local so now it's doubly fixed for good measure :P
Apart from that, the changes are great. I would also start adapting the docstrings now that they match the params for each function.
As soon as I push the "custom feature" feature I'm on it.
That's great! Thanks a lot for the explanations and for the fix. It makes sense to call the validation at later times when e.g. the run method was called starting for feature estimation.
On a smaller note, I think the BaseModel validate method
is now the model_validate
method right? https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel
Also, I think it would be great to use this PR also to get rid of the nm_settings.json' and replace it with a
nm_settings.yaml`. I wanted to do that for quite some time, but now it's finally interchangeable.
Hi Timon,
That's great! Thanks a lot for the explanations and for the fix. It makes sense to call the validation at later times when e.g. the run method was called starting for feature estimation.
On a smaller note, I think the BaseModel
validate method
is now themodel_validate
method right? https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModelAlso, I think it would be great to use this PR also to get rid of the
nm_settings.json' and replace it with a
nm_settings.yaml`. I wanted to do that for quite some time, but now it's finally interchangeable.That's great! Thanks a lot for the explanations and for the fix. It makes sense to call the validation at later times when e.g. the run method was called starting for feature estimation.
On a smaller note, I think the BaseModel
validate method
is now themodel_validate
method right? https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModelAlso, I think it would be great to use this PR also to get rid of the
nm_settings.json' and replace it with a
nm_settings.yaml`. I wanted to do that for quite some time, but now it's finally interchangeable.That's great! Thanks a lot for the explanations and for the fix. It makes sense to call the validation at later times when e.g. the run method was called starting for feature estimation.
Agree.
On a smaller note, I think the BaseModel
validate method
is now themodel_validate
method right? https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel
Yes, but also the validate()
that I'm calling is one I implemented myself in the NMBaseModel
class and that calls Pydantic model_validate()
. So it's not the deprecated validate
method from Pydantic itself.
Also, I think it would be great to use this PR also to get rid of the
nm_settings.json' and replace it with a
nm_settings.yaml`. I wanted to do that for quite some time, but now it's finally interchangeable.
Done!
Additionally, I have a question about the parameter line_noise
. I was trying to create a NotchSettings
class the same we have settings classes for the rest of things, and I'm not sure if line_noise
is something that relates to NotchSettings only, because I have seen that that the read_BIDS_data
function also takes it. So could you explain briefly what line_noise
represents? If it's something that's specific to the experiment, we can have it in NMSettings
with it's current default of 50.
Hey @toni-neurosc, thanks for the explanations. I think I need to go through the nm_burst function in detail. But I get the overall concept, and trust also that the result is the same as the computation from before. Optimally there would be a test with the simplest for loop iterations that is very easy to understand (and maintain) but also very inefficient.
Regarding line_noise
, that is the grid line noise. So in Europe 50, US 60 Hz. I think it needs to be passed to the run function also, same as sampling rate, since it depends on the specific recording. So I guess the tests could similarly be done with an assert as the sampling rate check here? https://github.com/neuromodulation/py_neuromodulation/blob/6777221a24566078ced3e44923888b1ae2d0caf8/py_neuromodulation/nm_stream_abc.py#L72
Hey @toni-neurosc, thanks for the explanations. I think I need to go through the nm_burst function in detail. But I get the overall concept, and trust also that the result is the same as the computation from before. Optimally there would be a test with the simplest for loop iterations that is very easy to understand (and maintain) but also very inefficient.
I did many checks, I think the only differences are causing by floating point rounding errors, you can check yourselfs the new and old results
Regarding
line_noise
, that is the grid line noise. So in Europe 50, US 60 Hz. I think it needs to be passed to the run function also, same as sampling rate, since it depends on the specific recording. So I guess the tests could similarly be done with an assert as the sampling rate check here?
I see, so this is the baseline noise caused by the electrical supply... I was thinking about having it be in settings, but I guess it makes sense that you would want the settings to be agnostic to the specific machines, and instead having them be limited to data processing settings. So it makes sense to have it somewhere else, Stream initialization is fine I guess. But it's really only ever used by NotchFilter really, so I was finding it annoying to have to toss it around in function parameters all over the project, it's only ever used in line 143 of nm_filter.py
but it appears in 38 places over 8 files, just being tossed around non_stop... I guess it'll stay like that for now 🤷
I added a couple more commits that reduced the computation time for sharpwaves
by a further 56% (from 3.52s to 1.55s in a 10k sample x 5 channel artificial dataset).
Sadly, some of the refactoring I made broke Example 3
. I could fix it by doing some dirty fixes but first I wanted to discuss a bit what that example is about, because the example is using internal private functions like _initialize_sw_features
, inyecting data into sw_analyze
, using signal.find_peaks
directly, pulling the filtered data out of the sw_analyzer
... It seems to me that in this particular example PyNM is not helping much and everything is being done with other modules. If we want PyNM to have the capability to easily calculate Sharpwave features and plot them, we should provide the appropriate interface.
The example_3 test will remain broken for the time being, everything else is passing.
Also, look at this beauty, Sharpwaves and Bursts at the same level and on par with Hjorth:
Hey @toni-neurosc, the concept of temporal waveform shape is overall very little used in the invasive electrophysiology community. Therefore, with this example we aimed to present the overall concept behind waveform shape, what characteristics can be computed, how they relate to movement, and also how those can be used for decoding.
I know that it used quite a "hacky" way of presenting the concept, and probably a better way is to do the convolution, peak finding etc. in the example itself without having the dependencies to the nm_sharpwave code.
So it would be great if you could also adapt the example to show the similar functionality with the speed improvements. I really appreciate the performance increase, but I definitely want to keep the example in the documentation for explainability.
Hey @toni-neurosc, I am getting an error with your optimized sharpwave module import:
from numpy.core._methods import _mean as np_mean ModuleNotFoundError: No module named 'numpy.core._methods'
Could that be again an OS-specific thing?
Hey @toni-neurosc, I am getting an error with your optimized sharpwave module import:
from numpy.core._methods import _mean as np_mean ModuleNotFoundError: No module named 'numpy.core._methods'
Could that be again an OS-specific thing?
Well no, what happened is that Numpy 2.0.0 release just 4 hours ago and that's what pip installed on the test server xD Weird coincidence. Probably we wouldn't even have noticed if I hadn't decided to use a numpy internal function to squeeze some performance lol. I'm gonna update Numpy on my local and fix it.
EDIT: found the relevant change, this caused the error: https://numpy.org/devdocs/numpy_2_0_migration_guide.html#numpy-core-namespace
Ok, @timonmerk , I fixed the Numpy issue, it was nothing, they just changed the name of the internal module files with the version change, no big deal, basically added a couple of underscore and it's working again.
Also a couple more changes:
StandardScaler
but it's actually twice as slow.Hey @toni-neurosc, the concept of temporal waveform shape is overall very little used in the invasive electrophysiology community. Therefore, with this example we aimed to present the overall concept behind waveform shape, what characteristics can be computed, how they relate to movement, and also how those can be used for decoding.
I know that it used quite a "hacky" way of presenting the concept, and probably a better way is to do the convolution, peak finding etc. in the example itself without having the dependencies to the nm_sharpwave code.
So it would be great if you could also adapt the example to show the similar functionality with the speed improvements. I really appreciate the performance increase, but I definitely want to keep the example in the documentation for explainability.
Ok, I will try to rework the example and maybe implement some functions to expose the result of calc_feature
without having to basically manually run the steps of calc_feature
in the example itself.
@toni-neurosc, I've fixed now the sharpwave example. It was much simpler than expected.
I also had to adapt some dependencies, such that the numpy.core
functions could be imported.
I also fixed the add feature
example that it passes the automatic test, and added a function in nm_IO
to read raw_data. This fixes #304.
So feel free to add the method adding a feature in a better way after stream
initialization.
Otherwise you can merge this PR. Many thanks again for the crazy amount of work adapting basically the whole package!
@toni-neurosc Sorry for opening this PR so soon, but I had a look at this branch and for the most part I can understand it. Would it be possible to adapt one example from the examples folder s.t. I could follow along and also help adapt the other examples / tests?