owntracks / recorder

Store and access data published by OwnTracks apps
Other
893 stars 123 forks source link

Locations API performance #483

Closed maximbaz closed 3 months ago

maximbaz commented 3 months ago

Hello, following up on https://github.com/owntracks/frontend/pull/124, this is the API request that was issued by the browser:

/api/0/locations?from=xxx&to=xxx&user=xxx&device=xxx&format=json

The entire setup is running in Docker.

I executed the commands below directly in the recorder container (changing "from" parameter):

time curl 'localhost:8083/api/0/locations?from=2022-01-01T00%3A00%3A00&to=2024-01-01T00%3A00%3A00&user=xxx&device=xxx&format=json' | jq '.data | length'

Every command was executed once, so we should account for some variation due to other processes running on the host.

The results were:

time period records returned time took (in seconds) records per second (records / time)
1 month 10120 4.65 2,176
2 months 23288 12.67 1,838
3 months 38599 21.35 1,808
1 year 176553 118.85 1,486
2 years 300149 180 1,667

Looking at the last column, it looks like the performance is more or less linear with the number of records recorder has to process.

Hope that helps, let me know if you'd like some additional details! And thanks for the awesome product, by the way :rocket:

jpmens commented 3 months ago

Thank you for opening this, and congratulations on the amount of data you have! :-)

It seems to me as though a limiting factor is Docker....

I have an old Recorder (0.9.6) running in a FreeBSD jail on a Shuttle barebones, and querying over the network (client is my Mac), I get the following values:

$ time curl -sSf "http://192.168.1.30:8083/api/0/locations?from=2000-01-01T00:00:00&to=2024-01-01T00:00:00&user=jpm&device=5s&format=json" | jq '.data | length'
160745

real    0m31.285s
user    0m2.707s
sys     0m0.936s

$ bc
>>> scale=2
>>> 160745/31.2
5152.08

So over 5000 records / second for which we'll not win any prizes obviously, but better than the numbers you're obtaining.

As an aside: the data our API returns contains a .count element, so piping the curl output to jq .count is a bit more efficient.

maximbaz commented 3 months ago

One other obvious candidate would be that I run this setup on HDD (on an Azure VM), in case there are lots of disk operations. Does your hardware have SSD?

I noticed that recorder consumes one entire core when executing this API call. This VM is tiny (just a dual core processor), I tried to stop a lot of other things to make sure recorder always has a free core for it, it helped a bit, but not by much (e.g. it reduced time from 3 minutes to 2 min 40 s).

Do you think it could be serialization / deserialization, that takes a big chunk of time? Because just processing a single file with plain grep seems to be fast (in terms of data access), but I admit I haven't looked at the code yet as to what else the API call is doing, besides opening the relevant files and filtering out for entries with type locations).

$ time curl 'localhost:8083/api/0/locations?from=2023-12-01T00%3A00%3A00&to=2024-01-01T00%3A00%3A00&user=xxx&device=xxx&format=json' | jq '.count'
real    0m 5.16s
user    0m 0.00s
sys 0m 0.01s
10120

$ time cat /store/rec/xxx/xxx/2023-12.rec | grep '"_type":"location"' | wc -l
real    0m 0.05s
user    0m 0.00s
sys 0m 0.01s
10120
linusg commented 3 months ago

I've been running https://github.com/networkException/oxidetracks in parallel with the regular recorder for a while and it gets a great speedup from keeping locations in memory once loaded from disk. I think this is worth exploring as an opt-in mechanism, the default memory footprint should stay minimal but if someone is willing to throw more RAM at the problem to make it go away that seems like a good option.

jpmens commented 3 months ago

processing a single file with plain grep seems to be fast

That would be blindingly fast, but the Recorder does indeed deserialize and serialize in order to filter out e.g. lwt messages and others that could be in there.

I have previously thought about replacing the plain text files by some sort of database (I think @linusg and I once briefly considered SQLite3), but it's not been really important to date.

The question is also, now circling back a bit to the Frontend PR you submitted, how often are people going to want to plot "all points ever!" on a map. Here it would be quite a messy business ... ;-)

jpmens commented 3 months ago

if someone is willing to throw more RAM at the problem

True, even though it would somehow defeat one of the original intensions of the Recorder: lightweight and low in resource consumption -- one of the reasons we didn't originally put PostgreSQL behind it or whatnot. :-)

I do agree that adding some form of optional large in-core cache would be an attractive proposition from the point of view speed.

maximbaz commented 3 months ago

Looking at all points ever is indeed not very frequent, but I still wonder what is the underlying root cause in the performance I'm seeing. Even just a single month for one device takes ~5 seconds for me, which feels too much.

If we assume that parsing is the issue, then I'd expect jq to suffer the same cause, but jq is able to produce "similar" output much faster:

# time curl 'localhost:8083/api/0/locations?from=2023-12-01T00%3A00%3A00&to=2024-01-01T00%3A00%3A00&user=xxx&device=xxx&format=json' | jq . > /tmp/curl.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3716k    0 3716k    0     0   794k      0 --:--:--  0:00:04 --:--:--  831k
real    0m 4.68s
user    0m 0.00s
sys 0m 0.02s

/ # time cat /store/rec/xxx/xxx/2023-12.rec  | grep '"_type":"location"' | jq -sR '[split("\n") | .[] | select(length > 0) | split("\t")[-1] | fromjson] | {data: ., count: length}' > /tmp/jq.json
real    0m 0.08s
user    0m 0.00s
sys 0m 0.01s

If I then compare the two jsons, the difference is in the following for every record:

-      "vel": 0
+      "vel": 0,
+      "ghash": "u3buymm",
+      "isorcv": "2023-12-31T10:00:27Z",
+      "isotst": "2023-12-31T10:00:27Z",
+      "disptst": "2023-12-31 10:00:27",
+      "tzname": "Europe/Copenhagen",
+      "isolocal": "2023-12-31T11:00:27+0100"

So could it be that calculating these fields takes additional time?

jpmens commented 3 months ago

Obviously adding those values takes time. In particular we invested quite a bit of effort optimizing tzname (and it is cached in lmdb), but the result with/without became quite acceptable IMO, and that's what it currently is. The other data ought to be almost negligeable.

jpmens commented 3 months ago

@maximbaz do you see measureable difference when you query the API with &format=json&fields=lat,lon" added to the URI? For my 31s query it shaves off roughly 9s.

But the question is moot, so don't bother: Frontend needs many of the values (not just lat, lon).

maximbaz commented 3 months ago

I've tried out of curiosity in any case :) To be honest I don't see a big difference, on a 1-month data it removes 1 second (from 6 to 5), and on a 6 month data it removed only a few seconds more (from 1m 11.57s to 1m 8.49s).

But isn't it interesting that it still takes so much time, even if it doesn't need to process tzname field, which we suspected to be the reason for slow response?

I'm wondering what makes the jq query I shared above, return almost the exact same data instantly, when executed on the same Docker container, in the same hardware and using the same dataset, given that jq also has to do some parsing? :thinking:

jpmens commented 3 months ago

Reason number 1: the jq authors are much better programmers than I am.

Other reasons: we do a huge number of memory allocations / deallocs for serializing and deserializing JSON.

jpmens commented 3 months ago

@maximbaz if you dump your LMDB cache, do you see all / most entries with tzname in them?

$ ocat -S <store> --dump
9exzrzw {"cc":"MX","addr":"Saucillo 40, El Saucillo, 36094 Guanajuato, GUA, Mexico","tst":1694908649,"locality":"Guanajuato","tzname":"America/Mexico_City"}

And if you don't see tzname in the database, it's probably because most of your data was collected before tzname became a thing. (I think/hope we describe this quite well in the README). The point here is, that if tzname is not in the cached data, we look it up. For 2300 entries, that increases duration here from 0.15s to 1.7s which is quite a bit.

You might wish to use addtzname to your data (do back up your store first!!!). I did this at the time which is another explanation for the difference in runtimes we have.

Slightly related: you might also want to look at OTR_CLEAN_AGE in the README.

jpmens commented 3 months ago

Other reasons: we do a huge number of memory allocations / deallocs for serializing and deserializing JSON.

I've been looking at the code more closely, something I've not done for a few years now, and these are more reasons:

  1. Files are constantly opened and closed, and we use stat(2) quite a bit. In other words: lots of system calls => bad for runtime
  2. As mentioned earlier, we alloc / dealloc memory constantly when serializing / deserializing data.
  3. I've also found a few instances of copying JSON objects where a copy wouldn't actually be necessary -- should clean these up a bit, but I'm confident is isn't taking seconds of runtime but only fractions of seconds. Edit: one down. :)
maximbaz commented 3 months ago

@maximbaz if you dump your LMDB cache, do you see all / most entries with tzname in them?

None, in fact! All of my entries look like this:

ud9z4hg {}
upcxvrf {}
upczfrc {}

Thanks for solving the mystery! I don't really care for reverse geocoding, so I don't run OpenCage or anything like this, not on recorder nor on the phones. Turns out I should have been passing --norevgeo flag to the recorder, which once I did, cut the response time for 2 years of data from 180 seconds down to 16 :grin:

Very sorry if this is considered a user configuration mistake!

I sent https://github.com/owntracks/docker-recorder/pull/82 to make this easier to setup.

jpmens commented 3 months ago

cut the response time for 2 years of data from 180 seconds down to 16

Well, that's quite positive.

Reverse geo is important to me, so it didn't sooner occur to me that you mighten't be using it. Glad we got that sorted, and thanks for the patch, which I've merged.