Add pydra tasks, workflow and update CLI

maestroque commented 2 months ago

Closes #

Proposed Changes

Create two pydra tasks, one to compute metrics and one to export metrics
Integrate them in a single pydra workflow, utilizing pydra tasks from physutils
Inspiration is taken from the current CLI implementation, aiming to replace and enhance it

Change Type

[ ] bugfix (+0.0.1)
[ ] minor (+0.1.0)
[ ] major (+1.0.0)
[ ] refactoring (no version update)
[ ] test (no version update)
[ ] infrastructure (no version update)
[ ] documentation (no version update)
[ ] other

Checklist before review

[ ] I added everything I wanted to add to this PR.
[ ] [Code or tests only] I wrote/updated the necessary docstrings.
[ ] [Code or tests only] I ran and passed tests locally.
[ ] [Documentation only] I built the docs locally.
[ ] My contribution is harmonious with the rest of the code: I'm not introducing repetitions.
[ ] My code respects the adopted style, especially linting conventions.
[ ] The title of this PR is explanatory on its own, enough to be understood as part of a changelog.
[ ] I added or indicated the right labels.
[ ] I added information regarding the timeline of completion for this PR.
[ ] Please, comment on my PR while it's a draft and give me feedback on the development!

maestroque commented 2 months ago

Note that the PR is currently stemming from integrate-physutils

maestroque commented 2 months ago

Also I would like you to enlighten me on retroicor if possible. In the current workflow implementation, the metrics are exported as such:

for metric in metrics:
        if metric == "retroicor_card":
            args = select_input_args(retroicor, kwargs)
            args["card"] = True
            retroicor_regrs = retroicor(physio, **args)
            for vslice in range(len(args["slice_timings"])):
                for harm in range(args["n_harm"]):
                    key = f"rcor-card_s-{vslice}_hrm-{harm}"
                    regr[f"{key}_cos"] = retroicor_regrs[vslice][:, harm * 2]
                    regr[f"{key}_sin"] = retroicor_regrs[vslice][:, harm * 2 + 1]
        elif metric == "retroicor_resp":
            # etc. etc.

Shall I keep it this way or is more research needed? I am not familiar with how retroicor is used. @m-miedema @me-pic @smoia

m-miedema commented 2 months ago

Also I would like you to enlighten me on retroicor if possible. In the current workflow implementation, the metrics are exported as such:
for metric in metrics:
        if metric == "retroicor_card":
            args = select_input_args(retroicor, kwargs)
            args["card"] = True
            retroicor_regrs = retroicor(physio, **args)
            for vslice in range(len(args["slice_timings"])):
                for harm in range(args["n_harm"]):
                    key = f"rcor-card_s-{vslice}_hrm-{harm}"
                    regr[f"{key}_cos"] = retroicor_regrs[vslice][:, harm * 2]
                    regr[f"{key}_sin"] = retroicor_regrs[vslice][:, harm * 2 + 1]
        elif metric == "retroicor_resp":
            # etc. etc.
Shall I keep it this way or is more research needed? I am not familiar with how retroicor is used. @m-miedema @me-pic @smoia

Keep it this way for now -- the handling of these derivatives is something we'll need to improve but we can circle back after the workflow is in place!

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 0% with 48 lines in your changes missing coverage. Please review.

Project coverage is 49.84%. Comparing base (f074cad) to head (d150e76). Report is 2 commits behind head on master.

Files	Patch %	Lines
phys2denoise/tasks.py	0.00%	48 Missing :warning:

Additional details and impacted files

[![Impacted file tree graph](https://app.codecov.io/gh/physiopy/phys2denoise/pull/57/graphs/tree.svg?width=650&height=150&src=pr&token=FWDBRF95Y3&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=physiopy)](https://app.codecov.io/gh/physiopy/phys2denoise/pull/57?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=physiopy) ```diff @@ Coverage Diff @@ ## master #57 +/- ## ========================================== - Coverage 53.85% 49.84% -4.02% ========================================== Files 8 9 +1 Lines 596 644 +48 ========================================== Hits 321 321 - Misses 275 323 +48 ``` | [Files](https://app.codecov.io/gh/physiopy/phys2denoise/pull/57?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=physiopy) | Coverage Δ | | |---|---|---| | [phys2denoise/workflow.py](https://app.codecov.io/gh/physiopy/phys2denoise/pull/57?src=pr&el=tree&filepath=phys2denoise%2Fworkflow.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=physiopy#diff-cGh5czJkZW5vaXNlL3dvcmtmbG93LnB5) | `0.00% <ø> (ø)` | | | [phys2denoise/tasks.py](https://app.codecov.io/gh/physiopy/phys2denoise/pull/57?src=pr&el=tree&filepath=phys2denoise%2Ftasks.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=physiopy#diff-cGh5czJkZW5vaXNlL3Rhc2tzLnB5) | `0.00% <0.00%> (ø)` | |

maestroque commented 1 month ago

I have discovered a discrepancy about using loguru within the pydra tasks. E.g. when running this

@pydra.mark.task
def compute_metrics(phys, metrics):
    if isinstance(metrics, list) or isinstance(metrics, str):
        for metric in metrics:
            if metric not in _available_metrics:
                # print(f"Metric {metric} not available. Skipping")
                logger.warning(f"Metric {metric} not available. Skipping")
                continue

            args = select_input_args(metric, {})
            phys = globals()[metric](phys, **args)
            logger.info(f"Computed {metric}")
    return phys

When including the loguru logs, when defining the task as in task2 = compute_metrics(phys=fake_physio, metrics=["respiratory_variance"]), it throws the following error:

>       task2 = compute_metrics(phys=fake_physio, metrics=["respiratory_variance"])

phys2denoise/tests/test_tasks.py:15:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pydra-p2d/lib/python3.10/site-packages/pydra/mark/functions.py:47: in decorate
    return FunctionTask(func=func, **kwargs)
pydra-p2d/lib/python3.10/site-packages/pydra/engine/task.py:146: in __init__
    fields.append(("_func", attr.ib(default=cp.dumps(func), type=bytes)))
pydra-p2d/lib/python3.10/site-packages/cloudpickle/cloudpickle.py:1479: in dumps
    cp.dump(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <cloudpickle.cloudpickle.Pickler object at 0x7fe5ea17e140>, obj = <function compute_metrics at 0x7fe5e9fb3910>

    def dump(self, obj):
        try:
>           return super().dump(obj)
E           TypeError: cannot pickle 'EncodedFile' object

pydra-p2d/lib/python3.10/site-packages/cloudpickle/cloudpickle.py:1245: TypeError

While when not including such calls it is not raised. That might have to do with something related to the parallel handling of tasks within pydra. So we could either not use logs at all within tasks (which tbh is not a viable workaround imo), or delve deeper. I'm trying to find online resources about pydra + loguru but there is no luck up to this point.

The good thing is that this is only specific to loguru as it seems, because stdlib logging calls seem to work. I'm planning to work with those for now and see how we'll move on. @smoia @me-pic @m-miedema

m-miedema commented 1 month ago

Currently needed to manually install nest_asyncio for testing, but after that the tests pass with the exception of the caplog logging assertion (this is also true for me in https://github.com/physiopy/physutils/pull/7). Still taking a closer look at the rest, will have a full review for you tomorrow!

me-pic commented 1 month ago

Same import issue as @m-miedema with nest_asyncio

me-pic commented 1 month ago

Also @maestroque can you please provide some doc about running the CLI ?

maestroque commented 1 month ago

Yes I forgot adding the nest_asyncio as a dependency sorry. It is used by the pydra examples, but I also need to verify the use, because currently things are working without it as well

What I do to test is running the cli/run.py script directly. Only once it's published you will be able to run it with phys2denoise as a command. To start off I run commands while in the phys2denoise repo like:

python3 phys2denoise/cli/run.py -in phys2denoise/tests/data/fake_phys.phys -out exports -cp -rv -hbi -rpv -rp -hr -hrv -md physio -tr 1 -nscans 4 -sl 1.2 3.4 5.6 -win 100

Then you could try different arguments etc. you can use python3 phys2denoise/cli/run.py --help for that. It also works for bids files, however you need to be careful with the specifications. One major thing is that the stored fake_phys.phys I use in the above example already has peaks and troughs loaded using peakdet, while for BIDS files that wouldn't be the case from the start, so you would have to process and then either pass the peaks manually or re-store as a pickled physio object. That issue will of course be not present once the peakdet workflow is put into place alongside this one

@m-miedema @me-pic

maestroque commented 1 month ago

@m-miedema what failed tests are you getting? Everything should pass now

m-miedema commented 1 month ago

@m-miedema what failed tests are you getting? Everything should pass now

I am getting the attached failure:

Let me know if there's something you can point me to that's wrong on my end!

maestroque commented 1 month ago

@m-miedema what failed tests are you getting? Everything should pass now

I am getting the attached failure:

Let me know if there's something you can point me to that's wrong on my end!

@m-miedema It seems that the code is the same version, and it passes locally I cannot recreate it. Also we cannot see the CI yet before https://github.com/physiopy/physutils/pull/7 merges and releases

Could you try to add the following to test/__init__.py to see if this fixes it?

from loguru import logger
import sys

logger.add(sys.stderr)

me-pic commented 1 month ago

@m-miedema can't replicate the error you are getting either

me-pic commented 1 month ago

@maestroque Not sure if that should even be addressed in this PR, but in the chest_belt.py script, we are using the np.math function which I believe is deprecated and might eventually cause some issue eventually...

maestroque commented 1 month ago

@maestroque Not sure if that should even be addressed in this PR, but in the chest_belt.py script, we are using the np.math function which I believe is deprecated and might eventually cause some issue eventually...

Yes, you are right! There is this open issue about numpy v2 compatibility opened for that #62

maestroque commented 1 month ago

@maestroque Thank for your work on that PR ! Good job overall 🎉 🎉

Would it be possible to add some tests to cover the CLI ? If you need some examples, you could refer to the tests in the giga_connectome package.

Sure, on it

m-miedema commented 1 month ago

@m-miedema what failed tests are you getting? Everything should pass now

I am getting the attached failure: Let me know if there's something you can point me to that's wrong on my end!

@m-miedema It seems that the code is the same version, and it passes locally I cannot recreate it. Also we cannot see the CI yet before physiopy/physutils#7 merges and releases

Could you try to add the following to test/__init__.py to see if this fixes it?
from loguru import logger
import sys

logger.add(sys.stderr)

This does not fix it - I'm not sure what's going on but since you and @me-pic are not able to replicate it, I continued on to test the rest of the CLI (see my next comment).

m-miedema commented 1 month ago

I'm having trouble using the CLI to calculate metrics when using a .phys object as my input. For example, I thought I'd run a quick test using the physio objects generated in the OHBM tutorial here but the CLI returns e.g. "Metric rv not computed. Skipping" without more useful information (even when I run in --debug mode - which as a side-note, doesn't actually change the output for me). Has anyone else successfully output metrics in this case?

maestroque commented 1 month ago

I'm having trouble using the CLI to calculate metrics when using a .phys object as my input. For example, I thought I'd run a quick test using the physio objects generated in the OHBM tutorial here but the CLI returns e.g. "Metric rv not computed. Skipping" without more useful information (even when I run in --debug mode - which as a side-note, doesn't actually change the output for me). Has anyone else successfully output metrics in this case?

@m-miedema I need you to provide the precise logs you get in order to understand the problem. I haven't had this issue personally

m-miedema commented 1 month ago

I'm having trouble using the CLI to calculate metrics when using a .phys object as my input. For example, I thought I'd run a quick test using the physio objects generated in the OHBM tutorial here but the CLI returns e.g. "Metric rv not computed. Skipping" without more useful information (even when I run in --debug mode - which as a side-note, doesn't actually change the output for me). Has anyone else successfully output metrics in this case?

@m-miedema I need you to provide the precise logs you get in order to understand the problem. I haven't had this issue personally

Certainly! If you've been able to get the CLI to run on a Physio object with peaks/troughs, could you share the object and the call with me and I can try it? So far I've tried a few different ways, but in general this is the type of call and output:

phys2denoise -in '.\sub-007_ses-05_task-rest_run-01_resp_peaks.phys' -e 'rv' -nscans 400 -tr 1.5 -sr 40 -lags 0 -win 6 --debug -out .

2024-09-04 09:32:47.905 | INFO     | phys2denoise.workflow:phys2denoise:208 - Running phys2denoise version: 0+untagged.276.gabd4c93.dirty
2024-09-04 09:32:47.914 | DEBUG    | phys2denoise.workflow:phys2denoise:233 - Metrics: []
2024-09-04 09:32:47.914 | DEBUG    | phys2denoise.workflow:phys2denoise:233 - Metrics: []
2024-09-04 09:32:48.866 | DEBUG    | physutils.tasks:wrapped_func:27 - Creating pydra task for transform_to_physio
2024-09-04 09:32:48.866 | DEBUG    | physutils.tasks:wrapped_func:27 - Creating pydra task for transform_to_physio
2024-09-04 09:32:51.227 | DEBUG    | physutils.io:load_physio:185 - Instantiating Physio object from a file
2024-09-04 09:32:51.227 | DEBUG    | physutils.physio:__init__:293 - Initializing new Physio object
Metric rv not computed. Skipping

m-miedema commented 1 month ago

One thing I think we should strongly consider as a future direction for the CLI is to set up a heuristic file with more metric specific parameters. For example, here we can calculate different metrics, but not specific different window sizes for each in the same call. I'm putting this comment along with a new issue here not to lose track of it - if others think this is a useful idea I can follow up in the future :)

m-miedema commented 1 month ago

I'm having trouble using the CLI to calculate metrics when using a .phys object as my input. For example, I thought I'd run a quick test using the physio objects generated in the OHBM tutorial here but the CLI returns e.g. "Metric rv not computed. Skipping" without more useful information (even when I run in --debug mode - which as a side-note, doesn't actually change the output for me). Has anyone else successfully output metrics in this case?

@m-miedema I need you to provide the precise logs you get in order to understand the problem. I haven't had this issue personally

Certainly! If you've been able to get the CLI to run on a Physio object with peaks/troughs, could you share the object and the call with me and I can try it? So far I've tried a few different ways, but in general this is the type of call and output:

phys2denoise -in '.\sub-007_ses-05_task-rest_run-01_resp_peaks.phys' -e 'rv' -nscans 400 -tr 1.5 -sr 40 -lags 0 -win 6 --debug -out .
2024-09-04 09:32:47.905 | INFO     | phys2denoise.workflow:phys2denoise:208 - Running phys2denoise version: 0+untagged.276.gabd4c93.dirty
2024-09-04 09:32:47.914 | DEBUG    | phys2denoise.workflow:phys2denoise:233 - Metrics: []
2024-09-04 09:32:47.914 | DEBUG    | phys2denoise.workflow:phys2denoise:233 - Metrics: []
2024-09-04 09:32:48.866 | DEBUG    | physutils.tasks:wrapped_func:27 - Creating pydra task for transform_to_physio
2024-09-04 09:32:48.866 | DEBUG    | physutils.tasks:wrapped_func:27 - Creating pydra task for transform_to_physio
2024-09-04 09:32:51.227 | DEBUG    | physutils.io:load_physio:185 - Instantiating Physio object from a file
2024-09-04 09:32:51.227 | DEBUG    | physutils.physio:__init__:293 - Initializing new Physio object
Metric rv not computed. Skipping

@maestroque I think it would be very helpful to make the "Metric rv not computed. Skipping" message slightly more verbose so that the user knows it is stemming from the export argument, rather than the computational argument. As it stands it's quite confusing! E.g. "Metric X not computed, skipping the export of metric X." or even better to throw a warning when a metric is provided as an export argument but not a computational argument.

m-miedema commented 1 month ago

As well, I opened a new issue to address this, but I'm finding that the number of time points in the exported resampled metric files don't match the -nscans argument in the CLI. I think users would expect this to be the case, so it's something we should dig into another time.

maestroque commented 3 weeks ago

Cleaned up, this should be ready to merge once physutils is

m-miedema commented 2 weeks ago

@maestroque thanks for updating this one!

physiopy / phys2denoise