jupyter / jupyter_client

Jupyter protocol client APIs
https://jupyter-client.readthedocs.io
BSD 3-Clause "New" or "Revised" License
374 stars 279 forks source link

deprecate automatically parsing header timestamps #1014

Open minrk opened 4 months ago

minrk commented 4 months ago

I meant to upstream this issue a long time ago, but forgot. It came up again in ipython/ipykernel#1210.

Parsing timestamps in message headers has turned out to be a major overhead in handling messages, especially when they are almost never used in their parsed form. As a result, ipyparallel patches extract_dates to be a no-op (at import time, which wasn't the right choice - I'll move that to instantiation of the appropriate objects). Ironically, ipyparallel probably uses timestamps the most of any jupyter project, but the intermediate de/re-serialization in schedulers had a massive cost in benchmarks. I believe the same is true in the notebook server, which relays many messages without needing to look inside them, but it's still doing tons of datetime parsing (#590).

I think we should just not do this parsing at all, and instead rely on consumers of messages to parse the timestamps they are interested in when they want to use them.

It's a hard transition to make, because it's technically a breaking API change to just stop parsing these, but it's very unlikely to actually break anything aside from tests themselves because they are so rarely used in code.

I don't have a great strategy for how to smoothly deprecate this behavior, so I'd love it if someone had a good idea for how to do it.

ipyparallel's monkeypatch could have ben avoided, too, if extract_dates were an instance method on Session instead of a module function called unconditionally.

One possible option:

The problem with this approach is that it makes both dates and strings technically valid, which means fully compatible code needs to be updated to handle both types, rather than just updating to the new unparsed strings, because consumers of Session-created messages aren't always in the same code that created them. But maybe that's the best we can do.

more related issues: