Open blakeNaccarato opened 10 months ago
We want to keep compatibility with papermill (although I agree that converting tuples into strings is an odd API choice).
I'm open to extending the API so users can customize the translator, what's the use case for this? wouldn't it be easier to convert the tuple into a list before executing the notebook?
We want to keep compatibility with papermill
Yeah, the papermill compatibility guarantee is understandable.
wouldn't it be easier to convert the tuple into a list before executing the notebook?
Yes, that's what I ended up doing in my case.
I'm open to extending the API so users can customize the translator, what's the use case for this?
I think I'm hitting these surprises because I'm interacting with namespaces of parametrized notebooks, a less-common use-case. I chose ploomber-engine
instead of Ploomber proper to keep it lightweight and more adaptable to the patterns I'm using (particularly in testing).
I am injecting parameters and handling namespaces in calling code. One rationale is to keep file-IO logic out of the notebook logic. In tests, I pass different parameters through a notebook and assert certain namespace results.
Allowing a substitute translator would probably be better than special-casing the existing translator. I figured it could be a slippery-slope if you do tuple
today, then someone wants slices
, and so-on. An alternative extension point would allow users to pass an override mapping of types to handler functions, which wouldn't require subclassing Translator
.
But a reasonable choice is also "the translator works the way that it does, we'll make sure the surprises are well-documented". It's really not that big of a deal to be limited to str
, bool
, int
, float
, dict
, and list
. If you like, I'm open to authoring the documentation change if that's the path chosen. I could also manage the feature implementation if that's the choice, once an approach is decided.
As an aside, get_namespace()
doesn't directly accept parametrization like execute_notebook()
does. So this requires importing some ploomber_engine
internals, ploomber_engine._util.parametrize_notebook
. Facilitating that on get_namespace()
would be really cool!
I've provided more implementation details of this use-case below.
Thanks for the detailed answer!
I think we can break down the action items:
PloomberClient
(see code below)get_namespace
, feel free to! Just open a new issue so we can track progress separatelySubclassing PloomberClient
:
class MyCustomClient(PloomberClient):
def translate_parameters(self, parameters):
# custom translation logic
client = MyCustomClient(...)
client.execute(...)
This would imply modifying PloomberClient
so it uses a default translator but subclasses could override this by re-writing a translate_parameters
method. Thoughts?
Thanks for the detailed answer!
Thanks! It was so detailed, that I got self-conscious and folded half of it under <details>
š. I think I'm about to outdo myself with this essay of a response, though. I've numbered relevant bullet points so you can refer to them in shorthand, e.g. (3), or help me decide the necessary ones per acceptance critera for the current work.
Also, it's nearly Thanksgiving, so know our communication will likely pause abruptly. I plan to work on it a bit over the weekend, but have enough to do in dev environment setup/documentation that I won't get into the implementation details this week, I'm sure.
If you also want to work on adding parametrization support for
get_namespace
, feel free to! Just open a new issue so we can track progress separately
(3) I'll open an issue on this sometime soon, but I'll prioritize working on the current matter first. If someone else comes along and wants to work the other PR in the meantime, that's fine, too.
(4) Unless otherwise noted, I'll assume the following work happens in one PR. Let me know if you recommend a different grouping/sequence of work, e.g. if certain tasks should wait for a follow-up.
Document how the parameter translation works so users know which data types are supported (and that any unsupported one is converted into a string)
PloomberClient
API doc will grow a translate_parameters
method. This could be the source of truth documenting default parameter translation behavior. Or should there be a different source of truth that this and the following points back to?execute_notebook
API reference, refer to parameters like "Arbitrary keyword arguments to pass to the notebook parameters." Reword this or add an admonition/warning below the basic description, linking to the source of truth detailing potentially surprising default behaviors and how to override.PloomberClient
and modifying execute_notebook
? Could be considered part of the optional demo recommendation.Changed in version ...
placeholders be added in this work/PR, or is that handled in a different process?PloomberClient
We can adopt a subclass approach to customize
PloomberClient
. [...] This would imply modifyingPloomberClient
so it uses a default translator but subclasses could override this by re-writing atranslate_parameters
method. Thoughts?
translate_parameters
method.
ploomber_engine._translator.translate_parameters()
is already used in a few places, should PloomberClient.translate_parameters()
just call that existing function?ploomber_engine._translator.PythonTranslator
directly?PloomberClient.translate_parameters()
method instead? Worried about leaving dangling logic pointing to prior implementation.execute_notebook()
operate on a custom user PloomberClient
? It seems it's instantiated now via INIT_FUNCTION
, so a client with custom translation won't be able to be used here? Should a client
argument be added to such functions, allowing a custom client in? This will likely be incompatible with PloomberMemoryProfilerClient
unless users actually subclass that one. I guess documentation in (7) is tied to this change.PloomberClient
still work (already covered by tests?), e.g. PloomberLogger
(especially since its own execute
method override has a FIXME
indicating duplicating some logic from PloomberClient
) and PloomberMemoryProfilerClient
. Maybe these are already sufficiently isolated and won't be affected as long as they don't override the translate_parameters
method.PloomberClient
for translate_parameters
handling, are there any breaking uses I should be aware of while implementing this method? I'm particularly interested in how a given client instance gets its hooks, like self.on_notebook_start
and self.on_cell_*
(used here). Can a user subclass inadvertently mess up those hooks? Or am I mixing these up, it seems maybe PloomberNotebookClient
is actually not involved here?PloomberClient
, probably in tests/test_ipython.py.PloomberClient
subclasses detailed in (14).This work will wire up the papermill
-consistent translation to a translate_parameters
method on PloomberClient
, and expose this method for overriding through subclassing. Unexpected interactions are likely wherever existing subclasses of PloomberClient
are in use, and it's not exactly clear to me how to handle those right now.
Maybe this work can be broken into a few PRs. Most important is the proposed feature, directly-related documentation, tests, and exposing this feature through various helper functions like execute_notebook()
. I think that enabling custom translation is worth the effort, though.
@neelasha23: looping you so you're aware of this
(1) If I create this as a draft PR, will that avoid noise for maintainers and assignments? I don't want to create noise until the PR is ready or I explicitly reach out with further questions.
That's fine, opening a draft PR won't create any noise
(2) If I need to mention one of the maintainers with an issue, should I ping you or someone else? Here or in the PR?
you can tag me and @neelasha23, she'll be reviewing your code as well
(11) Separate PR? Should any Changed in version ... placeholders be added in this work/PR, or is that handled in a different process?
Please add an "added in version" in the docstring for the new translate_parameters
method
(12) Write a default translate_parameters method.
we can use ploomber_engine._translator.translate_parameters()
(13) Should existing calls to parameter translation throughout the codebase be changed to use the PloomberClient.translate_parameters() method instead? Worried about leaving dangling logic pointing to prior implementation.
Unsure about this, where in the codebase are we calling the translate parameters? I'm tempted to keep those as is since calling PloomberClient.translate_parameters()
should only happen inside the PloomberClient
class. But let me know where you find other call instances
(14) How can current exposed functionality like execute_notebook() operate on a custom user PloomberClient? It seems it's instantiated now via INIT_FUNCTION, so a client with custom translation won't be able to be used here? Should a client argument be added to such functions, allowing a custom client in? This will likely be incompatible with PloomberMemoryProfilerClient unless users actually subclass that one. I guess documentation in (7) is tied to this change.
Fair point. I think let's leave this point out of the initial work. We can think of ways for users to call execute_notebook
with a custom client. But let's not add too many things to a single PR.
(15) Separate PR? Verify existing subclasses of PloomberClient still work (already covered by tests?), e.g. PloomberLogger (especially since its own execute method override has a FIXME indicating duplicating some logic from PloomberClient) and PloomberMemoryProfilerClient. Maybe these are already sufficiently isolated and won't be affected as long as they don't override the translate_parameters method.
Good catch. If you feel like there are missing unit tests, please add them. This can be the first thing to tackle, as it'll ensure you that your changes aren't breaking existing functionality.
(16) If users are encouraged to subclass PloomberClient for translate_parameters handling, are there any breaking uses I should be aware of while implementing this method? I'm particularly interested in how a given client instance gets its hooks, like self.on_notebook_start and self.oncell* (used here). Can a user subclass inadvertently mess up those hooks? Or am I mixing these up, it seems maybe PloomberNotebookClient is actually not involved here?
Right PloomberNotebookClient
is not involved. PloomberNotebookClient
is a legacy client that derives from nbclient, this was our first implementation, before we implement a notebook executor from scratch. We consider this legacy code and might remove it in the future.
I hope I didn't miss anything important! Feel free to tag me in your draft PR if you have questions
Thanks for the thorough response. I created a draft PR, so I'll continue the discussion over there and ping you both as needed for clarification. First I'll establish a rough scope, set up my environment, and write tests covering existing subclass usage as needed, before getting into the actual implementation.
sounds good!
It looks like time has gotten away from me here, so I figured I'd chime in to say I'm still alive. I haven't gotten around to the implementation, though I do still intend to, and would like to put time into it this month. If any passers-by see this and are eager to work on this themselves, just @ me and I'll try to respond promptly.
Parametrizing a notebook on
parameters={"list_param": [0,1], "tuple_param": (0,1)}
will result inlist_param
value being injected as a list, but thetuple_param
value being injected as the stringified representation of atuple
, e.g."(0, 1)"
(literally with the quotes). I understand thatploomber-engine
strives to match thepapermill
API, so maybe this is intentional to maintain that compatibility, but is there a chance for the parameter translator to faithfully inject tuples?Maybe a more sensible request would be to allow the user to pass a custom
Translator
toPloomberClient
, which could pass the customTranslator
subclass toparametrize_notebook
(below), allowing the user to customize behavior, in the case that it is desired to mimic thepapermill
API surface by default?Contents of
translate
Here we see that
tuple
falls through to the "last resort" translator which stringifies it.https://github.com/ploomber/ploomber-engine/blob/e00bdc2d7676955a9fc00cd514fa9c98a92bf904/src/ploomber_engine/_translator.py#L76-L95
Abbreviated contents of
parametrize_notebook
https://github.com/ploomber/ploomber-engine/blob/e00bdc2d7676955a9fc00cd514fa9c98a92bf904/src/ploomber_engine/_util.py#L72-L73 ... https://github.com/ploomber/ploomber-engine/blob/e00bdc2d7676955a9fc00cd514fa9c98a92bf904/src/ploomber_engine/_util.py#L92-L94