materialsproject / jobflow

jobflow is a library for writing computational workflows.
https://materialsproject.github.io/jobflow
Other
90 stars 25 forks source link

Optional Flag to help Debugging #527

Closed jmmshn closed 6 months ago

jmmshn commented 6 months ago

Optional Flag to help Debugging

Currently, whenever any run_locally call fails. The stack trace is produced at the end and we only have access to the error via a logger even

if you set RAISE_IMMEDIATELY: true, the stack trace will be provided where the excution actually failed. This allows you to move around better within a debugger.

utf commented 6 months ago

Thanks @jmmshn, this is a useful addition.

I think this flag could be better as an argument to run_locally rather than a general setting, since it is specifically for that function and doesn't impact other managers. Although if there is a particular reason to have it is a general setting, let me know.

jmmshn commented 6 months ago

The flag is help with situations where the excution is already serialized like in a fireworks wf. Finding and modifying the exact run_locally command is not always trivial. So I think the trade off is:

In my usages cases the second problem seems more annoying since I'm pretty much dealing with a single run_locally call at any given time so being about to target is not super important.

utf commented 6 months ago

Hi @jmmshn, I'm not sure I follow. What do you mean by "finding the exact run_locally command"? Running jobflow with fireworks never calls run_locally it calls the Job.run() function. This change would only impact workflows that are run via a script with run_locally at the end, hence there should only be one call to that function per workflow.

I guess one potential benefit of the setting vs argument, is that you could change the setting once a job is already submitted and in the queue (but not running).

jmmshn commented 6 months ago

Hi @jmmshn, I'm not sure I follow. What do you mean by "finding the exact run_locally command"? Running jobflow with fireworks never calls run_locally it calls the Job.run() function. This change would only impact workflows that are run via a script with run_locally at the end, hence there should only be one call to that function per workflow.

OK, I missed that, you are correct, this will not influence the FW stuff only run_locally for now. I thought the FW manager calls the run_locally command but that's not true.

I guess one potential benefit of the setting vs argument, is that you could change the setting once a job is already submitted and in the queue (but not running).

Yeah, I think w/e manager we have in the future there might be situations where you add many jobs to a queue and one fails, it will be useful to just rerun the failed job with the flag turned on in some interactive environment and start exploring the stack trace. So even if this only affects the run_locally command for now this can be added to other ways of executing the wf's in the future depending on who you want them to deal with exceptions. I'm OK with either one so I'll leave the decision up to you.

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (8ae6ec9) 99.86% compared to head (a76586a) 99.86%. Report is 3 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #527 +/- ## ======================================= Coverage 99.86% 99.86% ======================================= Files 20 20 Lines 1512 1514 +2 Branches 416 417 +1 ======================================= + Hits 1510 1512 +2 Misses 2 2 ``` | [Files](https://app.codecov.io/gh/materialsproject/jobflow/pull/527?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=materialsproject) | Coverage Δ | | |---|---|---| | [src/jobflow/managers/local.py](https://app.codecov.io/gh/materialsproject/jobflow/pull/527?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=materialsproject#diff-c3JjL2pvYmZsb3cvbWFuYWdlcnMvbG9jYWwucHk=) | `100.00% <100.00%> (ø)` | |
jmmshn commented 6 months ago

OK, I thought about this some more. I think for harder debugging stuff people are gonna be touching the code a lot anyway so it makes more sense to just have it be an argument.

utf commented 6 months ago

Thanks @jmmshn