neuropsychology / NeuroKit

NeuroKit2: The Python Toolbox for Neurophysiological Signal Processing
https://neuropsychology.github.io/NeuroKit
MIT License
1.59k stars 423 forks source link

Design #32

Closed DominiqueMakowski closed 4 years ago

DominiqueMakowski commented 4 years ago

Alright folks, now that we started to add some functions and all, the time has come to talk design 😏

One very good point was brought by @JohnDoenut, related to its great implementation of rsp processing (btw, thanks again for the effort you put in documenting the code, it was a joy to read 😌)

I guess the discussion ultimately comes down to how we want the API to look like. I prefer giving the user some more granularity in their choice as to what they want to get out of the signal. I.e., they instantiate the breathing object (Rsp) and can then choose to return only those attributes that they're interested in. I.e., a user might only want to look at the signal or only have the extrema returned. This would be in contrast to a rsp_process() function that returns all attributes at once in a nested dict like it was the case in Nk1. The latter feels less intuitive to me.

Basically this boils down to the question of whether we want most of the API to be functions or classes. Your comment made me think a lot, here's my current thoughts (might be subject to changes though 😁)

When I started python, I didn't understand classes (as I had no background in programming whatsoever). As I didn't understand them, I didn't like them and didn't use them. Eventually, I managed to understand them, and it felt so powerful and more like "legit programming" (it felt looked more sophisticated). As a result... I started using them everywhere. Now, after having widened my experience in other languages, I feel like I tend to come back to functions... besides the intuition and the feeling, here are some arguments that crossed my mind:

IMO functions are more intuitive, especially for new users without much programming background, in the sense that you apply a function on an input and you get an output, y = f(x), as opposed to creating an entity that both contains information and also functions to apply on itself.

Importantly, functions are more maintainable, meaning more easy to understand, debug, test and contribute to, due to their self-contained nature and because they can be nicely organise in different files or subfunctions. For example, incredible packages like mne for eeg are quite tough to understand and contribute to if you're not a core Dev,having their powerful (and complex) classes defined in verrrry long files (with a lot of internal methods and all).

All in all, I feel like this preference for functions or classes has strong interindividual variability, depending on our background and experience, and that the end both are equivalent in terms of functionalities. Most importantly, please do challenge my opinion and try to change my mind! Here's how two designs could look like (these are examples of names and functions):

class based API

# Create object
rsp_object = nk.Rsp(signal)  

# Call methods
rsp_object.stats()
rsp_object.findpeaks()
rsp_object.plot()

function based API

I think that part of the success of nk1 was due to the easy with which you get results. Anybody, without any understanding of ecg, could run process_ecg() and get tons and tons of stuff out (this famous dict with all the values). More that most of us understood, which had the benefit of making you happy (who doens't like having some results values) and hopefully motivate to learn more about the values that are spitted out.

While I don't want necessarily to reproduce nk1's design, having this "master" function that returns a lotta stuff is appealing, especially as a point of entry for new users. However, if something was to be changed, it would be the way this master function is defined. Indeed, I'd like nk2 to be more flexible and more easy to use for advanced users with personalised flows. This means, that the different steps that ecg_process does (preprocessing, segmentation, peaks finding, rate computing, features extraction etc.) could be available and accessible indenpendently and directly. So while a user would mainly start by using the master function to see how, he could then start controling more his pipeline with the individual functions as he gains expertise. Here's how it coud look like:

Beginner

# Get results
results = nk.rsp_process(signal)

# Extract results
results["Stats"]
results["Peaks"]
nk.rsp_plot(results)

Advanced

As rsp_process would be a high level function, calling other functions, an advanced user that desires more control over the processing flow could be able to call the other subfunctions separately.

filtered = nk.rsp_prepare(signal)
peaks = nk.rsp_findpeaks(filtered)
stats = nk.rsp_stats(peaks)

Critically, having the subfunctions individualised would also make it easier to document (as we could have specific documentation for subfunctions and refer to them from the master, instead of having one big docstring), debug, and improve. I might be repeating myself a lot here, but this is aspect cannot be neglected if we want our package to have high standards in terms of quality and safety (and we want to be "professional" 😁), as it it was ultimately pushed me to reimplement nk2 instead of trying to improve nk1.

Let me know your thoughts!!

JanCBrammer commented 4 years ago

Thanks for taking the time for putting together this overview and comparison, much appreciated :)

I'm happy to read that we agree on our API design goal: enabling quick and easy access to information in biosignals while at the same time providing a user with more low-level, fine-grained control over their workflow.

My first and so far only experience with user interfaces is a GUI. There, stuff goes sideways quite quickly without classes. Hence I might be biased towards using classes (plus being relatively new to Python and programming in general I identify with the "yay, I learned about classes, they're cool and sophisticated, let's use them everywhere" attitude that you mention :D).

I agree with your argument that functions can be easier to understand, implement and maintain which makes them the safer option for both users and contributors (at least at the present stage of the project).

So I guess as far as I'm concerned we can tear down that Rsp class :hammer: :D

I'd love to hear other opinions about this as well :)

Btw, as I'm trying to learn about API design, I came across the following article. Also this related blog post.

DominiqueMakowski commented 4 years ago

I started writing down some ideas for the design philosophy in the README. Let's see how this works and how we can further refine it toward the best API ☺️