Open walljcg opened 1 week ago
Have linked a WIP PR #198 which has a rough POC for the get_patrol_observations
flow
Using the test_asyncer.py
script included in the PR to compare the two clients,
get_patrol_observations
over the MEP-DEV dataset (12 patrols, ~500 observations) goes from ~18s to ~11s
However this isn't a great test as the observations are all within a single patrol
Hooking it up to MEP (prod), for ~30k observations (last few days of patrol data) the total time reduces from ~152s to ~64s
(tests were done on a high latency connection)
That's probably enough of an improvement to take this further, keen to get eyes on the flow as it is just in case others have ideas I haven't considered.
I noted some slight difference in observation counts for certain subjects (in all cases, the async version fetches more observations). I've investigated this to the point that I'm confident the difference stems from get_objects_multithreaded
and not ecoscope code.
We have been using the er_client.get_objects_multithreaded() function to download events and observations. If we switch to async rather than multithreads we have the potential to greatly speed up the download time.