neuromodulation / py_neuromodulation

Real-time analysis of intracranial neurophysiology recordings.
https://neuromodulation.github.io/py_neuromodulation/
MIT License
47 stars 11 forks source link

Fix docstrings #343

Closed timonmerk closed 5 months ago

timonmerk commented 5 months ago

Just started with this PR adressing #331

So far I've renamed the nm_stream_offline file into nm_stream, since it's currently used both for offline and online processing.

I've started to add some docstrings for the user interface functions. Currently I did not do that for the pydantic settings classes. Mainly since the user should use in future mainly the GUI. I hope you see that the same way @toni-neurosc.

In addition, I removed the nm_settings.json and replaces it with the nm_settings.yaml.

timonmerk commented 5 months ago

@toni-neurosc I think I found a potential issue with the settings.yaml. For the coherence channel pairs you've used (correctly) python tuples. When those are however serialized, they cannot be saved correctly since yaml (for some reason) doesn't support tuples.

So the serialized yaml coherence pairs look as follows:

coherence:
  channels:
  - !!python/tuple
    - STN_RIGHT_0
    - ECOG_RIGHT_0

So the python yaml package adds this !!python/tuple tag. I could fix that now by changing the yaml.safe_load() into yaml.load(). This approach works, but it's quite unsafe, since any Python code would be executed in this manner.

Therefore, it would probably be a good option to replace the tuple with a simple list?

toni-neurosc commented 5 months ago

A list of lists would be fine, Pydantic can help enforcing that each list is of length 2. There is also the possibility of customizing serialization for Pydantic models, which we could do specifically for the CoherenceSettings class: https://docs.pydantic.dev/latest/concepts/serialization/#custom-serializers So you can at serialization time convert it to a list. The problem with that is that I imagine when loading it back you would load a list, so then we need to convert it back to a tuple in a model_validator.

But this actually gives me a chance to ask a question I had about coherence default settings: where do the default channel names for coherence come from? Are they there just so that they work with the example dataset? Because then I think we should leave the default coherence channels empty, give a warning that coherence channels need to be set if coherence is enabled with no channels, and for the example set the channels explicitely

PS: btw what do you think of saving lists and dicts in abbreviated form in the YAML settings? Like this:

fruits: [Apple, Orange, Strawberry, Mango]

instead of this

fruits:
  - Apple
  - Orange
  - Strawberry
  - Mango

PS2: just a link to the YAML docs explaning the compact style of list https://www.yaml.info/learn/flowstyle.html

timonmerk commented 5 months ago

Yes, that's a good point. I think we should remove the example names, since they're only related to the example file, and also put a comment with the required syntax. Also, let's change the tuple type to list of string then. Would it be a lot of work to add it? If you're focused now on the GUI I can also give it a try.

I like the flow style a lot! It make the file much more clear.

toni-neurosc commented 5 months ago

I can take care of it.

But since the channel names are required to be in pairs, I think a list of lists of length 2 can be better if Pydantic supports it. I'll give it a go, and if it's too cumbersome we can enforce a list of strings to have an even number of members.

toni-neurosc commented 5 months ago

Hi @timonmerk , I fixed a couple issues with the Coherence settings, one is the YAML list of lists things, I used typing.Annotated to make a custom type ListOfTwoStr, kinda self-explanatory :P Works perfectly with Pydantic validation, tested it with 3 and 1 str and it doesn't let you do anything else than 2.

I also made coherence channels an empty list in the default settings, and added an example of how to add channel pairs.

Finally I fixed the issue we had with the coherence channels validation, when it was looking for exact matches between coherence channels and channel names. Now it ensures that each coherence channel matches exactly 1 channel that starts with the same sub-string (to prevent ambiguous channel names in coherence.channels)

BTW: we should remove assertions and replace them with raise RuntimeError or similar. Assertions are meant for testing purposes and with optimizations enabled they dissapear. This is something uncommon in Python as optimizations do almost nothing and no-one uses them, but for example in C all assertions are removed in release compilation mode. So we should get rid of them to future-proof ourselves, in case Python becomes a JIT-compiled language in the future as it's looking like with Python 3.13 and this becomes an issue.