Open umax opened 1 year ago
Hi @umax, thanks for reporting this!
Offhand, I don't see any major perf issues with that code; ultimately all the work being done is "parse the response". Unfortunately, that means there probably won't be any quick wins here :( For this specific case, you could try providing a custom subclass for Value
which has from_repr
overridden to do less parsing, which might help a bit, but it's hard to say how much. If you override all the things with custom subclasses, you might get some improvements, but it'll be a bit of whack-a-mole, I fear.
More generally, the gcloud-aio
library was generally built on the idea of converting back-and-forth from code representations to api representations, so this is a pretty fundamental design change. We've discussed before migrating the libraries to support a dual mode, eg. to have a client which handles just the raw api data and then an optional OOP layer on top. I'm thinking that might be the only real way to tackle this sort of thing, as exposing the raw details directly is going to be nearly impossible.
So... yeah, allowing direct API access is really the only path I see here. I'd certainly be open to it, but it's going to be a really big change and is going to need a bunch of design work upfront to see how we'd want to do it across all our libs to maintain a consistent feel. If you (or anyone else!) is interested in helping with this work, we can certainly consider it, but on our side of things we likely don't have OSS maintenance hours to devote to this sort of work in the near future.
Hi, @TheKevJames, thanks for reply!
I've made some benchmarks and found that most slowest part of the library inside from_repr
method of Value
class. I tried to improve it (got only x4-5 times, but it's not enough for me). The problem is that your library converts every raw property to Python type. Usually it's not necessary, because mostly you will analyze only some properties, change it and push in Datastore again. That's why I decided to create my own library (mostly the idea is the same - convert from API representation to Python objects), but with lazy loading entity's properties. It works about ~9-10 times faster and is more strict about mirroring HTTP API structures on Python classes. I tried to patch your library, but it breakes backward compatibility because of lazy loading approach (
Feel free to close this task.
Hey, guys! I use this library in my Google Cloud Run services to work with Google Datastore. Let's say you have a service with 10 requests per second traffic. Your service makes request to Google Datastore to fetch 250 entities. Nothing more. Every object from query result converted to Entity class instance. So, Google Cloud Run monitoring shows that my service make 50% CPU load. If I disable converting query results to Entity, service uses 8% CPU. The problem in this function: https://github.com/talkiq/gcloud-aio/blob/master/datastore/gcloud/aio/datastore/value.py#L31 Any ideas how to fix this performance issue?