Open sebbacon opened 4 years ago
import pdb; pdb.set_trace()
We have breakpoint()
since 3.7! Which is exactly the same when called without arguments, but ties in nicely to the opensafely breakpoint
subcommand from a marketing/docs point of view.
This relates to #556.
Any updates?
We we probably be starting work on addressing at least some of this in a couple of weeks
I don't see what creating graphs needs to happen inside the secure environment. Additionally, I would imagine that they're harder for output checkers to check.
I agree. The proposal was really about debugging problems, whether they're graphs or something else. Could be spinning data shaping scripts, for example.
Our happy path for OpenSAFELY users is for them never to have to access the L3 data directly: they write scripts that make the data useful and less-disclosive, and these are saved as
medium_security
outputs to the L4 server.However, sometimes analysts may need interactive access. Examples by @krishnanbhaskaran:
The "huge plots" issue was an SVG which was absolutely massive. Debugging this meant interactively twiddling the real data.
The "useful scale" issue is that, given the problem "given this input data, make a plot with nice y-axis scales and x-axis ticks in pleasing locations", most graphing libraries require some trial and error.
I propose two things - first, documentation; second, rudimentary breakpoint/remote hands support. Point 1b would be to implement variable matrices.
Documenting patterns for decomposing and retrying code
Model convergence problems can be addressed adequate in some cases, at least, using project pipelines. For example in the households research we wrote the model so it could take a number or parameters and then repeated this as an action 20 times (when we support
matrix
/ variable looping this will get easier). The logging output of each of the 20 actions was designed to give useful information for further iterating the input parameters.I think this can be summarised at:
So for the plot idea, during development, you'd just output a bunch of versions all at once, with something like:
Remote hands
It's possible the other use cases could be met through core support by a small number of "remote hands" users. In this case we just need to ensure we have properly documented the entry points and protocols for doing so.
For this, I propose we document breakpoints in each language and support detecting them in the job-runner. So, say I have suspect an issue with the CSV load in Pandas. I do this:
The job runner monitors stdout for
(pdb)
, and when it finds it, changes the job state tobreakpoint
or similar.This can alert the author that the debug session is waiting, and the remote hands person can run
opensafely attach
or something to attach to the running python docker container. Thebreakpoint
UI and notifications could specify whichopensafely
command to run.Perhaps we should support writing instructions for remote hands users, e.g. by allowing a job to be linked to a github issue?