investigate whether JSON serialization is a bottleneck, and optimize if so

indirectlylit commented 2 years ago

Observed behavior

DRF APIs and (after #6316 is fixed) our task serialization use the built-in JSON serialization libraries. This is safe and works everywhere, but there are also libraries that allow faster c-based implementations to be used:

Expected behavior

Some performance tests could help determine if using a faster de/serialization library would help our users.

If we end up using an optimized library, we'll need to make sure it falls back gracefully on unsupported architectures and Python versions.

User-facing consequences

Improved server performance

Context

0.15

EliKlein commented 2 years ago

I believe the ORM is not doing amazingly well with keeping track of Job objects. I'm pretty sure that (in my local code where #6316 is pretty much fixed) they're being serialized/deserialized a lot more than they theoretically would have to be. It seems like this setup is the most manageable and easiest way to handle it as things stand, but I thought I should mention it, as it's relevant to this issue.

rtibbles commented 2 years ago

I think that seems fairly likely - we have a fairly blunt hammer approach to this at the moment. One possibility would be to put more of the Job attributes directly into the ORMJob as distinct columns, rather than trying to serialize all Job attributes into a single column.

learningequality / kolibri