Closed maximbaz closed 3 months ago
Thank you for opening this, and congratulations on the amount of data you have! :-)
It seems to me as though a limiting factor is Docker....
I have an old Recorder (0.9.6) running in a FreeBSD jail on a Shuttle barebones, and querying over the network (client is my Mac), I get the following values:
$ time curl -sSf "http://192.168.1.30:8083/api/0/locations?from=2000-01-01T00:00:00&to=2024-01-01T00:00:00&user=jpm&device=5s&format=json" | jq '.data | length'
160745
real 0m31.285s
user 0m2.707s
sys 0m0.936s
$ bc
>>> scale=2
>>> 160745/31.2
5152.08
So over 5000 records / second for which we'll not win any prizes obviously, but better than the numbers you're obtaining.
As an aside: the data our API returns contains a .count
element, so piping the curl output to jq .count
is a bit more efficient.
One other obvious candidate would be that I run this setup on HDD (on an Azure VM), in case there are lots of disk operations. Does your hardware have SSD?
I noticed that recorder consumes one entire core when executing this API call. This VM is tiny (just a dual core processor), I tried to stop a lot of other things to make sure recorder always has a free core for it, it helped a bit, but not by much (e.g. it reduced time from 3 minutes to 2 min 40 s).
Do you think it could be serialization / deserialization, that takes a big chunk of time? Because just processing a single file with plain grep
seems to be fast (in terms of data access), but I admit I haven't looked at the code yet as to what else the API call is doing, besides opening the relevant files and filtering out for entries with type locations).
$ time curl 'localhost:8083/api/0/locations?from=2023-12-01T00%3A00%3A00&to=2024-01-01T00%3A00%3A00&user=xxx&device=xxx&format=json' | jq '.count'
real 0m 5.16s
user 0m 0.00s
sys 0m 0.01s
10120
$ time cat /store/rec/xxx/xxx/2023-12.rec | grep '"_type":"location"' | wc -l
real 0m 0.05s
user 0m 0.00s
sys 0m 0.01s
10120
I've been running https://github.com/networkException/oxidetracks in parallel with the regular recorder for a while and it gets a great speedup from keeping locations in memory once loaded from disk. I think this is worth exploring as an opt-in mechanism, the default memory footprint should stay minimal but if someone is willing to throw more RAM at the problem to make it go away that seems like a good option.
processing a single file with plain grep seems to be fast
That would be blindingly fast, but the Recorder does indeed deserialize and serialize in order to filter out e.g. lwt
messages and others that could be in there.
I have previously thought about replacing the plain text files by some sort of database (I think @linusg and I once briefly considered SQLite3), but it's not been really important to date.
The question is also, now circling back a bit to the Frontend PR you submitted, how often are people going to want to plot "all points ever!" on a map. Here it would be quite a messy business ... ;-)
if someone is willing to throw more RAM at the problem
True, even though it would somehow defeat one of the original intensions of the Recorder: lightweight and low in resource consumption -- one of the reasons we didn't originally put PostgreSQL behind it or whatnot. :-)
I do agree that adding some form of optional large in-core cache would be an attractive proposition from the point of view speed.
Looking at all points ever is indeed not very frequent, but I still wonder what is the underlying root cause in the performance I'm seeing. Even just a single month for one device takes ~5 seconds for me, which feels too much.
If we assume that parsing is the issue, then I'd expect jq
to suffer the same cause, but jq
is able to produce "similar" output much faster:
# time curl 'localhost:8083/api/0/locations?from=2023-12-01T00%3A00%3A00&to=2024-01-01T00%3A00%3A00&user=xxx&device=xxx&format=json' | jq . > /tmp/curl.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3716k 0 3716k 0 0 794k 0 --:--:-- 0:00:04 --:--:-- 831k
real 0m 4.68s
user 0m 0.00s
sys 0m 0.02s
/ # time cat /store/rec/xxx/xxx/2023-12.rec | grep '"_type":"location"' | jq -sR '[split("\n") | .[] | select(length > 0) | split("\t")[-1] | fromjson] | {data: ., count: length}' > /tmp/jq.json
real 0m 0.08s
user 0m 0.00s
sys 0m 0.01s
If I then compare the two jsons, the difference is in the following for every record:
- "vel": 0
+ "vel": 0,
+ "ghash": "u3buymm",
+ "isorcv": "2023-12-31T10:00:27Z",
+ "isotst": "2023-12-31T10:00:27Z",
+ "disptst": "2023-12-31 10:00:27",
+ "tzname": "Europe/Copenhagen",
+ "isolocal": "2023-12-31T11:00:27+0100"
So could it be that calculating these fields takes additional time?
Obviously adding those values takes time. In particular we invested quite a bit of effort optimizing tzname
(and it is cached in lmdb), but the result with/without became quite acceptable IMO, and that's what it currently is. The other data ought to be almost negligeable.
@maximbaz do you see measureable difference when you query the API with &format=json&fields=lat,lon"
added to the URI? For my 31s query it shaves off roughly 9s.
But the question is moot, so don't bother: Frontend needs many of the values (not just lat
, lon
).
I've tried out of curiosity in any case :) To be honest I don't see a big difference, on a 1-month data it removes 1 second (from 6 to 5), and on a 6 month data it removed only a few seconds more (from 1m 11.57s to 1m 8.49s).
But isn't it interesting that it still takes so much time, even if it doesn't need to process tzname
field, which we suspected to be the reason for slow response?
I'm wondering what makes the jq
query I shared above, return almost the exact same data instantly, when executed on the same Docker container, in the same hardware and using the same dataset, given that jq also has to do some parsing? :thinking:
Reason number 1: the jq
authors are much better programmers than I am.
Other reasons: we do a huge number of memory allocations / deallocs for serializing and deserializing JSON.
@maximbaz if you dump your LMDB cache, do you see all / most entries with tzname
in them?
$ ocat -S <store> --dump
9exzrzw {"cc":"MX","addr":"Saucillo 40, El Saucillo, 36094 Guanajuato, GUA, Mexico","tst":1694908649,"locality":"Guanajuato","tzname":"America/Mexico_City"}
And if you don't see tzname
in the database, it's probably because most of your data was collected before tzname
became a thing. (I think/hope we describe this quite well in the README). The point here is, that if tzname
is not in the cached data, we look it up. For 2300 entries, that increases duration here from 0.15s to 1.7s which is quite a bit.
You might wish to use addtzname to your data (do back up your store first!!!). I did this at the time which is another explanation for the difference in runtimes we have.
Slightly related: you might also want to look at OTR_CLEAN_AGE
in the README.
Other reasons: we do a huge number of memory allocations / deallocs for serializing and deserializing JSON.
I've been looking at the code more closely, something I've not done for a few years now, and these are more reasons:
stat(2)
quite a bit. In other words: lots of system calls => bad for runtime@maximbaz if you dump your LMDB cache, do you see all / most entries with tzname in them?
None, in fact! All of my entries look like this:
ud9z4hg {}
upcxvrf {}
upczfrc {}
Thanks for solving the mystery! I don't really care for reverse geocoding, so I don't run OpenCage or anything like this, not on recorder nor on the phones. Turns out I should have been passing --norevgeo
flag to the recorder, which once I did, cut the response time for 2 years of data from 180 seconds down to 16 :grin:
Very sorry if this is considered a user configuration mistake!
I sent https://github.com/owntracks/docker-recorder/pull/82 to make this easier to setup.
cut the response time for 2 years of data from 180 seconds down to 16
Well, that's quite positive.
Reverse geo is important to me, so it didn't sooner occur to me that you mighten't be using it. Glad we got that sorted, and thanks for the patch, which I've merged.
Hello, following up on https://github.com/owntracks/frontend/pull/124, this is the API request that was issued by the browser:
The entire setup is running in Docker.
I executed the commands below directly in the recorder container (changing "from" parameter):
Every command was executed once, so we should account for some variation due to other processes running on the host.
The results were:
Looking at the last column, it looks like the performance is more or less linear with the number of records recorder has to process.
Hope that helps, let me know if you'd like some additional details! And thanks for the awesome product, by the way :rocket: