lina-usc / pylossless

🧠 EEG Processing pipeline that annotates continuous data
https://pylossless.readthedocs.io/en/latest/
MIT License
20 stars 8 forks source link

Requirements and Tutorial Fixes #57

Closed Andesha closed 1 year ago

Andesha commented 1 year ago

This PR includes a few different fixes:

Thanks!

scott-huberty commented 1 year ago

LGTM - @christian-oreilly can you review? You've got a better eye on the setup file.

EDIT: thanks @Andesha !

scott-huberty commented 1 year ago

Leaving a note that I left a comment here as it could be easy to miss

scott-huberty commented 1 year ago

After some discussion with @christian-oreilly -

we think we can make some changes that drop the hard dependencies for Iprogress and Ipywidgets entirely, and remove any need for that notch_filter if-statement that is tripping things up.

@Andesha If you give me push privileges to your fork I can push some commits directly.

Andesha commented 1 year ago

@scott-huberty I think you should be allowed... if not, do you know where I can change it ?

scott-huberty commented 1 year ago

@Andesha Maybe you're right - I'll try to push a commit and see.

If not you would need to add me as a collaborator to your fork. but i'll test pushing a commit first.

Screen Shot 2023-03-15 at 11 24 25 AM

Andesha commented 1 year ago

invite should be sent

scott-huberty commented 1 year ago

Wow, there are a lot more issues with the readme than I realized.

  1. I'm hitting an error in mne.io.raw.set_montage for the example file used in the readme, because the tutorial does not instruct the user to provide a montage to the config. and mne needs a montage when it calls raw.set_montage. @Andesha I'm surprised the pipeline ran for you using this file.

Unless I'm missing something so obvious, the problem is this file used a biosemi system yet the montage in that file does not fit any of the biosemi montages provided by MNE. And I don't have access to a custom montage file...

  1. The EGI example is outdated. MNE bids supports reading EGI data and we don't need to use a convoluted work around with tempfile.

  2. We should only maintain our tutorials in docs/examples.. Because these are built off jupyter notebooks and we can utilize CI to make sure that they continue to run without errors.

@christian-oreilly - I believe you wrote this tutorial at the beginning of our dev. Given that I can't even get it to work. I suggest removing the open neuro example, and starting with a fresh example (ideally something already in mne.datasets.sample or mne.datasets.misc), and moving it to docs/examples.

I can help do that in a separate PR. The most important part of this PR IMO is the setup fixes that @Andesha made and i'd like to prioritize getting that merged.

scott-huberty commented 1 year ago

A couple changes -

I tested the pipeline on my local (without the aforementioned packages installed in my environment), and everything worked.

Everything else LGTM! Thank you @Andesha !!

I think we are good to merge.

christian-oreilly commented 1 year ago

Well, before merging this, I think we need to discuss the README. I am not so keen about you just deleting everything in there. Remember that the README is what is rendered as the homepage of the GitHub repo. Minimal instructions on installing and using the pipeline should be there. More advanced cases can be linked to the doc...

Also, did you move this doc somewhere else or you just deleted it? Yes, indeed, I wrote this. It was working fine, but it is no surprise that it would not work now since this was not part of a CI and the pipeline evolved significantly since then. It should be fixed (and maybe part of it moved elsewhere if we want to keep the readme shorter), but not just deleted.

christian-oreilly commented 1 year ago

(When I say "it should be fixed", of course I volunteer to do that since I wrote this originally so I'll have an easier time fixing it.)

scott-huberty commented 1 year ago

Well, before merging this, I think we need to discuss the README

@christian-oreilly the current tutorial exists in docs/examples and can be seen here.

I don't think it is such a big deal to not have a usage tutorial on the GitHub repository page if it exists on the packages website. There is a lot of precedent for this: MNE-Python / MNE-bids / mne-bids-pipeline etc etc.

It's much easier to maintain these tutorials when they can be integrated to CI. This way, failing tutorials don't sit for months until someone points it out to us. Tutorials in the docs can be integrated in the CI. Tutorials in the readme can't be.

christian-oreilly commented 1 year ago

@christian-oreilly the current tutorial exists in docs/examples and can be seen here.

This looks very nice, @scott-huberty Thanks for looking into that.

I don't think it is such a big deal to not have a usage tutorial on the GitHub repository page if it exists on the packages website. There is a lot of precedent for this: MNE-Python / MNE-bids / mne-bids-pipeline etc etc.

I also don't think it is required. But you more than that has been removed, e.g., the installation bit (which I definitely think should remain). Also, although I definitely don't think a "tutorial" is needed on the front page, an example (as short as possible) showing how to use the package is generally welcome. This can be a 3-4 lines examples. And this example can be added with the test for CI validation. I think making it ridiculously easy to install and run a basic example is what should be targeted; the documentation is for the long explanations with all the gory details once the users are hooked (by the simplicity of the first approach).

scott-huberty commented 1 year ago

No prob, let's circle back on a new Pr to fix that tutorial.

I'll return basic install instructions tomorrow, then let's merge?

P.s. Run the pipeline on a given file in 4 lines of code? I like your style, let's make that the target ;)

christian-oreilly commented 1 year ago

Yeah.. something like 1 - import 2 - get default config 3 - run pipeline given a paht 4 - start QC on the output from 3

scott-huberty commented 1 year ago

Yeah.. something like

1 - import

2 - get default config

3 - run pipeline given a paht

4 - start QC on the output from

I don't know if doing that in 4 lines is realistic right now, feel free to push something up though (I would love to be wrong!).

Btw I think we always require the user to assign a montage to the config. we can probably handle this better, in some cases the channel locs may already be in the info structure of the raw object. I'll bring it up in another ticket.

christian-oreilly commented 1 year ago

I don't know if doing that in 4 lines is realistic right now, feel free to push something up though (I would love to be wrong!).

This is a target, not a current status. The reason why the example was long is because the code was not there yet.

Btw I think we always require the user to assign a montage to the config. we can probably handle this better, in some cases the channel locs may already be in the info structure of the raw object. I'll bring it up in another ticket.

I am not sure this statement is accurate. I think that in the case of a BIDS dataset, the info about the montage is loaded along with the raw. There is no reason to have anything about the montage in the config in such a case.

scott-huberty commented 1 year ago

This is a target, not a current status.

got it - then we are on the same page!

I am not sure this statement is accurate.

You are right, I was mistaken. We first look for a montage in raw.

https://github.com/lina-usc/pylossless/blob/4c3ae7108da82a1985f4548394d472dbbfb7db37/pylossless/pipeline.py#L774-L784

I returned the pip install instructions. @christian-oreilly Are you okay with merging?

Andesha commented 1 year ago
  • I don't think think the pipeline uses scikit-learn so I removed the dependency.

We need it for the fastica call I believe...

scott-huberty commented 1 year ago
  • I don't think think the pipeline uses scikit-learn so I removed the dependency.

We need it for the fastica call I believe...

You are right - Sorry for undoing that.

christian-oreilly commented 1 year ago

@Andesha @scott-huberty Not a big issue (i.e., probably not necessary to act on this at this time), but this is not really a dependency of pyLossless, but rather a dependency of MNE (and it is part of the requirements for MNE: https://github.com/mne-tools/mne-python/blob/main/requirements.txt). I think it is better in general to let packages we use solve their own dependecies than percolating these into the requirements of pylossless.

scott-huberty commented 1 year ago

@christian-oreilly yeah... but not everything in that requirements.txt gets installed with pip install mne. For example pandas is in there but it is not a dependency of mne.

I think this is one of those cases where MNE uses the package for some functions but they don't include it in their own dependencies (to reduce technical debt). If you don't have it installed when you call their MNE function that uses it they return an error to tell you to install.

We can file a ticket with them but i doubt they will budge. I don't think sci-kit is a dependency of mne.

christian-oreilly commented 1 year ago

Sure. I still don't think we should trickle all sub-dependencies directly in pylossless, just the packages we use directly. The packages that gets call by other packages should be managed by these packages. E.g., fastica is not a hard dependency of the pipeline; it depends on what is specified in the config file. Alternatively, if you like the idea to have a very inclusive install strategy, maybe using a pip install pylossless[all] and pip install pylossless[minimal]?

scott-huberty commented 1 year ago

fastica is not a hard dependency of the pipeline;

That's a very good point. But then again we specify fastica in the default config.

EDIT: for the first ICA (used to flag epochs).

christian-oreilly commented 1 year ago

Yes, indeed. As I said, it is not a big issue. Maybe we can keep this dependency for this specific case because of what we set as default (or change the default config). I just wanted to make the point that I am not sure that, in general, we should take upon ourselves to manage dependencies from external packages, first because it is not our responsibility, second because these external packages may change their dependencies with time which means that we then need to keep up with future changes in dependencies of external packages.

That being said, I am not so clear about the up-to-date best practices in package install in python. Up to recently, I was not distinguishing the requirements.txt vs what packages get installed... and arguments I read for distinguishing these did not convinced me yet. Anyway, you are right that MNE does not install these by default, but only these: https://github.com/mne-tools/mne-python/blob/main/requirements_base.txt

scott-huberty commented 1 year ago

Maybe we can keep this dependency for this specific case because of what we set as default (or change the default config)

yeah, I agree, keep it for now. I think the important thing to keep in mind is that @Andesha tried to run the pipeline with the provided instructions and config and couldn't.

. I just wanted to make the point that I am not sure that, in general, we should take upon ourselves to manage dependencies from external packages,

I agree!

That being said, I am not so clear about the up-to-date best practices in package install in python. Up to recently, I was not distinguishing the requirements.txt vs what packages get installed... and arguments I read for distinguishing these did not convinced me yet. Anyway, you are right that MNE does not install these by default, but only these: https://github.com/mne-tools/mne-python/blob/main/requirements_base.txt

me neither... I think @Andesha gave us his understanding (something like requirements.txt for developers, setup.py for managing hard dependencies).