Useful feedback on identifying failing cells (nbmake vs nbval)

alex-treebeard commented 2 years ago

nbmake is not clearly able to run all cells to see which pass (maybe allow errors should run all AND fail if cells fail if an option is set). (@jhlegarreta chime in if you have any thoughts here)
nbmake requires reading the stack trace to get the failing cell, whereas nbval shows all failing cells in the report

https://github.com/carpentries-incubator/SDC-BIDS-dMRI/pull/207#issuecomment-1087878354

jhlegarreta commented 2 years ago

Thanks for the effort @alex-treebeard.

nbmake is not clearly able to run all cells to see which pass (maybe allow errors should run all AND fail if cells fail if an option is set). (@jhlegarreta chime in if you have any thoughts here)

In our use case, I'd prefer a flag that applies to the whole notebook, much like the timeout one: I expect all cells to work correctly, and we are interested in finding ANY cell that does not work when running the CIs.

We'd need to add the flag to each and every cell otherwise, which is not practical.

Not sure how complicated this is, but hope it provides useful information about our intended use case.

alex-treebeard commented 2 years ago

allow_errors is configured once for each notebook, not per-cell.

two issues though:

You may want this to be a global CLI setting so you don't have to edit notebook metadata for new and existing nbs
Currently allow_errors means that failing notebooks are marked in pytest as successful.

Perhaps we should have a global flag --nbmake-continue-on-error which runs all cells, and fail if any fail.

jhlegarreta commented 2 years ago

allow_errors is configured once for each notebook, not per-cell.

OK. Have not tried.

two issues though:

1. You may want this to be a global CLI setting so you don't have to edit notebook metadata for new and existing nbs

Sorry if my thinking was unclear: I meant that this would be the preferred option rather than configuring the flag for each cell/notebook.

2. Currently `allow_errors` means that failing notebooks are marked in pytest as successful.

OK. I understand the rationale. Then the CLI flag should have another name, as we want the failing notebook to be marked as such.

Perhaps we should have a global flag --nbmake-continue-on-error which runs all cells, and fail if any fail.

Not sure about the naming. But otherwise the runs all cells, and fail y any fails looks good.

BTW, given that in our use case each notebook was run using a separate command, I'm curious behind the reason why the rest did not run past the preprocessing notebook.

alex-treebeard commented 2 years ago

Not sure about the naming. But otherwise the runs all cells, and fail y any fails looks good.

Noted! Will think about naming.

BTW, given that in our use case each notebook was run using a separate command, I'm curious behind the reason why the rest did not run past the preprocessing notebook.

If this line fails, github actions will stop and fail. I recommend running them all in one pytest invocation instead.

jhlegarreta commented 2 years ago

Noted!

Thanks for the effort @alex-treebeard

If this line fails, github actions will stop and fail. I recommend running them all in one pytest invocation instead.

Not sure if that behavior is consistent: in the current main branch, where we use nbval a failing cell does not make GitHub Actions to completely stop the job: https://github.com/carpentries-incubator/SDC-BIDS-dMRI/runs/5572292930?check_suite_focus=true#step:13:60

So not sure what triggers GitHub's stop in here. Maybe I'm presuming too much, but in regular Python code if a test fails (and here I am assuming that a different pytest command is run for each detected module), the rest of the tests within the module or outside it continue without trouble.

alex-treebeard commented 2 years ago

Thanks for the effort @alex-treebeard

np!

not sure what triggers GitHub's stop

Github actions stops if a command exits with non-zero exit code.

pytest will exit with non-zero code if a test fails -- this works well to notify the user of a failure.

nbval in your cause does not seem to trigger a pytest failure for a cell timeout. IMO this is not desired behaviour because it increases likelihood of the timeout not being noticed/fixed.

If you want to ignore a pytest failure you can do pytest ... || true, although I'm not sure why this would be desirable.

jhlegarreta commented 2 years ago

Github actions stops if a command exits with non-zero exit code.

pytest will exit with non-zero code if a test fails -- this works well to notify the user of a failure.

OK. After some more thinking, I also realize that my assumptions about pytest behavior in regular Python code was wrong.

nbval in your cause does not seem to trigger a pytest failure for a cell timeout. IMO this is not desired behaviour because it increases likelihood of the timeout not being noticed/fixed.

I agree that a timeout should be marked as a failure unless explicitly specified otherwise. Our main problem with nbval is that it does not care about nbval, and that is why we are trying to use other solutions.

If you want to ignore a pytest failure you can do pytest ... || true, although I'm not sure why this would be desirable.

We would not be interested in ignoring failures.

alex-treebeard commented 1 year ago

Closing this as I don't think further action is necessary for the time being.

treebeardtech / nbmake

Useful feedback on identifying failing cells (nbmake vs nbval) #67