Replication performance

garethbowen commented 8 years ago

Replication is slower and consumes more CPU than we'd expect. Investigate why this is.

SCdF commented 8 years ago

(my notes for now)

Perf

How long does it take for one curl statement that we know will roll over 10,000 documents? Single user, local machine, attached to mains.

`return false`

~4.7 for the last 10k (seq 28488 -> 38488)

`return true`

~5.4-5.8 seconds. We can presume this extra second is the time required to generate and move the JSON

Production Erlang

~28 seconds.

Production Python

~38 seconds.

Things we could do:

Optimise the Erlang somehow? It definitely has slow bits, and it's written by an amature, it's just down to whether or not we can fix those things
Remove the need for the filter at all. Some options:
- Be able to generate a list of everyone this document applies to, and:
- Compile it into the doc on save (in Couch), so the filter just checks that one field. Probably makes it take ~6-7 seconds to run. Pros: replication doesn't change, Cons: any document save forces it to be replicated back; could get into infinite loops if we write something locally when we get a doc test reading one value
- Generate a view with this in it and use _all_docs with this view and the appropriate keys to replicate first. Pros: docs don't change; view call should be lightning fast, Cons: we'd have to hack / lie to couch about its replication, which means screwing with a 'private' (in the sense that there is no contract against it) API to set some values that will trick couch into thinking it's replicated up to a point; having to deal with all of the edge cases that replication Just Does.
- Generating a view per user. Pros: crazy fast, doesn't break replication model, Cons: views slow down saving, unless we're careful all these views running will wreck everything by making them do the bare minimum (eg combine it with the first point so it's just a one property check), we'd need to make sure they didn't replicate down to the client so that work would have to be done first, they'd need to be created for each user that gets created.

estellecomment commented 8 years ago

From monitoring a non-filtered replication, CPU load on lg.app server goes from 0-10% to 15-30%, and up to 70% at times.

From eying it, it could be the CPU load peaks when the docs are being requested : replication doesn't use bulk queries, it issues a zillion individual GETs. We could benchmark the difference between bulk GET vs. individual GETs.

estellecomment commented 8 years ago

Another fix if that's it is to throttle the frequency of GETs a client can issue, to avoid having couch serving itself to death

garethbowen commented 8 years ago

I expect we'll see massive performance improvement when (and if) couchdb 2 gets released with the bulk get docs.

estellecomment commented 8 years ago

? No bulk get? There's all_docs with keys=[aaa, bbb, ccc], same thing.

garethbowen commented 8 years ago

Replication doesn't use it.

garethbowen commented 8 years ago

To expand on that comment a little... have you ever noticed when hitting the app as a restricted user in a new browser you get a 400 for the URL: /medic/_bulk_get?revs=true&attachments=true accompanied by the comment "The above 400 is totally normal. PouchDB is just detecting if the remote supports the _bulk_get API.". This is pouchdb doing feature detection for an API available in couchbase and couchdb v2. When this fails it does a whole bunch of single doc requests instead of using the all_docs feature. I think this is because it wants the rev history. So when we get couchdb v2 it'll eliminate all the single doc requests by using the new _bulk_get API instead.

SCdF commented 8 years ago

OK, so I've looked into replication using a view, and the gist is that it doesn't usefully exist.

It works by using a filter called _view which reads the view parameter from your request, runs the current doc through that view and if there are any emits the document returns. So it's just as slow as running a filter.

When you think about how replication works I suppose this makes sense, since it runs through the changes feed change by change, and so that's not really possible to map cleanly to a view. In theory we can write whatever Erlang we like, which means it might be possible to do crazy hax and access internals and make something that uses a pre-computed view. At that point we're essentially monkey-patching CouchDB…

This leaves us with filtered replication being the only way to restrict documents inside the same database, which isn't great because it really doesn't scale. As we add more data both initial replication times and continual sync times are going to increase exponentially (in the sense that each user adds not only a constant amount of extra documents but also N documents per month or whatever).

Bar anyone else's good ideas I think we need to plan to move off filtered replication completely. We can optimise the Erlang maybe and deploy more Couch nodes maybe, but in the end the relationship between users and replication performance just does not scale.

The two obvious ways to move off filtered replication is to have a view for every user, or a DB for every user. The former being less effort to move to, and the latter being more complicated but probably scales better (though perhaps one DB on multiple nodes would have similar scaling properties to N DBs).

estellecomment commented 8 years ago

I think this is because it wants the rev history

Yeah, Pouch doesn't use _all_docs because that will only fetch the latest rev, while for replication it's fetching specific revs.

SCdF commented 8 years ago

The two obvious ways to move off filtered replication is to have a view for every user, or a DB for every user.

I'm an idiot, a view for every user doesn't work, I just said that. So DBs is the only alternative here.

estellecomment commented 8 years ago

Or replace replication by view queries. Basically reimplementing replication. Which is probably not a good idea because of infinite finicky little edge cases.

SCdF commented 8 years ago

So here is the plan: I'm going to look at the feasibility of baking in the list of who can access the document into the document on document save, on the client, and specifically what kind of performance benefit that brings:

[x] locally edit a bunch of documents (~10k) so that they have this information compiled into them. This could be the basis for an eventual / potential upgrade script
[x] measure the performance of filtering over those documents with a new, simpler filter (ie over _changes)
- [x] how long does a simple version of this take?
- [x] how long does a version of this filter that checks if that data exists and if not falls back on the old filter code take?

^-- this, combined with my data above, should give us an idea of how much faster this could be. Based on this we might decide to:

[ ] implement this for all ongoing documents
- [ ] add a document update handler plugin to Pouch to run this code on document save
- [ ] use this same code in Couch in document validation to validate that it's been set correctly (ie stop rogue agents adding all to the list or whatever)
- [ ] update the Erlang filter to try this value first and fall-back on the more complicated logic if it doesn't find it

And perhaps (but probably not):

[ ] migrate existing documents with a medic_api migration script that fills this in for every existing document. This would only be worthwhile if the speedup was intense, since it means effectively everyone has to start again and replicate from the beginning (since all documents get edited by the migration)

Or consider:

[ ] implementing some kind of initial file download to avoid having to run through all of the old changes for new clients. Discussed in #2284.

estellecomment commented 8 years ago

So, not moving off filtered replication?

Plan sgtm! Last checkbox is covered by https://github.com/medic/medic-webapp/issues/2284, both are interesting IMO.

SCdF commented 8 years ago

Yeah, the last checkbox is just to show a sort of direction of thought, not something I'd do in this ticket. I'll link it.

In terms of not moving off filtered replication, I've had some conversations with @garethbowen about this and basically it's not really possible. There are no good solutions to remove filtered replication that don't add huge amounts of mental madness. One thing we've talked about is a sort of midway point where we still use filtered replication, but we massively limit what users have to sift through using either multiple instances / deploys, or multiple databases, per thing, where thing might be a district. Which would mean it's still not perfect, but it limits the amount of scaling the filtering has to deal with (ie now it's however large a district gets, not however large the entire deploy gets).

SCdF commented 8 years ago

OK so performance results are weird. They lead me to the conclusion that the prime factors that affect filter performance are:

Moving from JS to Erlang so that the data can stay native (no parsing into and out of JSON over the wire)
The literal byte-length of the Erlang filter. My guess is that it is eval'd every time the filter runs (smaller = quicker to parse).

Details

(the section of documents tested is a different section, these results compare to each other but not to others in the rest of this ticket. 10,000 documents where all but 27 [cbf working it out] have the compiled permissions. Tested three scenarios: original code, shortcut that can fallback on the original code, and just a shortcut (where those 27 docs return false))

Code	Time	Notes
All JS	~54s	Code changes didn't make a difference
Erlang Original	~29.5s
Erlang with Shortcut and Fallback	~29s	Barely faster at all
Erlang with Shortcut and No Fallback	~10s	Much faster
Extra long Erlang	~78s	With shortcut and fallback, and then a bunch more Erlang pasted afterwards that gets parsed but never runs
Extra short Erlang	~9.5s	With shortcut and no fallback, all comments removed

This lines up with my true and false tests being so much faster: I implemented those by deleting all the code and replacing them with true or false, thus making the code much shorter.

This is both fascinating and frustrating at the same time. It means two things:

Couch may be able to be optimised to not constantly re-parse Erlang filters
At least for now, the only way this change would help is if we migrate all documents to this so we can remove the fallback, which forces everyone to completely re-replicate (ie every document has changed).

SCdF commented 8 years ago

^-- I uploaded my experimental filters and their results if anyone is interested / think I'm crazy.

SCdF commented 8 years ago

Raised https://issues.apache.org/jira/browse/COUCHDB-3021

SCdF commented 8 years ago

I have also pinged the mailing list (I'll link once it appears on their web cache). This is really all we can do for now, once there is some movement on either / both of those I'll update.

In side-news, there appears to be a fork of couchdb that supports view replication: https://github.com/barrel-db/barrel . Not sure we are desperate enough to move to a fork for one feature, but there it is regardless.

SCdF commented 8 years ago

OK but in other news I just saw this in the code: https://github.com/apache/couchdb-couch/blob/master/src/couch_changes.erl#L297-L319

I have no idea how to read Erlang, but it looks like there is an implication that (at least in master) there is a way to use views for replication and have it actually be fast.

SCdF commented 8 years ago

OK so more digging:

Now when the option seq_indexed=true is set in the design doc, the view filter in _changes will use it to retrieve the results. Compared to the current way, using a view index will be faster to retrieve changes. It also gives the possibility to filter changes by key or get changes in a key range. All the view options can be used

Testing underway…

SCdF commented 8 years ago

Yeah I'm not sure about this. I can add the option and nothing breaks, but I can't get the key option to also work. Looking through the code is… challenging.

SCdF commented 8 years ago

I edited my view code so it logs whenever it's called, and adding those options doesn't stop the function from being called each time, which means we are no closer to running "fast" view code.

SCdF commented 8 years ago

Mailing list thread: https://mail-archives.apache.org/mod_mbox/couchdb-user/201605.mbox/browser

garethbowen commented 8 years ago

I'm investigating a filtering workaround... report back soon. If you don't hear from me, send help!

estellecomment commented 8 years ago

About Couchdb's replication, a Couchbase Lite for ios developer says:

From my limited testing, the performance bottleneck in the current algorithm seems to be in fetching the new revisions from the source. I think this is due to the overhead of handling many separate HTTP requests. This was the rationale for adding _bulk_get in Couchbase Mobile.

A limited case of the above-mentioned bulk-get optimization is possible with the standard API: revisions of generation 1 (revision ID starts with "1-") can be fetched in bulk via _all_docs, because by definition they have no revision histories. Unfortunately _all_docs can't include attachment bodies, so if it returns a document whose JSON indicates it has attachments, those will have to be fetched separately. Nonetheless, this optimization can help significantly, and is currently implemented in Couchbase Lite.

from https://github.com/couchbase/couchbase-lite-ios/wiki/Replication-Algorithm Edit : that's for normal replication, not specifically filtered.

SCdF commented 8 years ago

@estellecomment is that actually about the couchdb implementation, or about their own? The way I read that it sounded like performance notes for their own implementation, not an analysis or CouchDB's

estellecomment commented 8 years ago

Sounds like it's about Couchdb's:

These notes were derived from reading the API documentation on the CouchDB wiki and from conversation with engineers who've worked on CouchDB's replicator (Damien Katz and Filipe Manana). But don't take them as gospel.

estellecomment commented 8 years ago

Cloudant have documented Couchdb's replication algorithm here : http://www.replication.io/spec

garethbowen commented 8 years ago

From my limited testing, the performance bottleneck in the current algorithm seems to be in fetching the new revisions from the source.

That's not where our bottleneck is right now. This is definitely a concern - fetching 1000s of docs one by one is not good. But the far bigger bottleneck for us currently is the filter.

garethbowen commented 8 years ago

The current proof of concept is in these two branches:

Needs more testing and profiling with a full db but it seems to work. I have to put back the initial replication feature as well as migration to the remote db. I'll work on this tomorrow.

SCdF commented 8 years ago

Huh @garethbowen interesting twist! So instead of restricting the changes feed by id you pull the entire thing and then filter in node. That's a nice way of making sure you get to see all deletions without having to do two queries!

(Looks good so far :-) )

garethbowen commented 8 years ago

Yeah. It's not the way I intended to go, but we have to iterate through the changes feed for the deleted items anyway, so I think it makes sense to do both at the same time.

garethbowen commented 8 years ago

This could be useful: https://github.com/pouchdb/pouchdb/issues/5263

garethbowen commented 8 years ago

@SCdF Two PRs created for this: https://github.com/medic/medic-webapp/pull/2347 and https://github.com/medic/medic-api/pull/75

Performance seems good locally. I want to merge to develop so I can update a production clone and test with real data, real hardware, real OS, and a remote server. Further changes or a complete revert may be necessary...

SCdF commented 8 years ago

@garethbowen looks good dude, give it a crack!

garethbowen commented 8 years ago

Merged. Next step is to test it in a production like environment. Then clean up filters and other code if it works well.

SCdF commented 8 years ago

Depending on how this test affects system load (ie if everything goes to plan it should massive reduce it) we may also want to look at increasing our costs slightly to two servers:

One that deals with serving CHWs (effectively a replication server)
One that deals with live requests (effectively an admin server)

With master->master replication between each of them. Currently (in PROD) that app is effectively non-functional because too many requests timeout, presumably due to replication load.

We wouldn't even need any code changes to really do this, we should be able to simply have another server, medic-api, url, the works, and have that as the one that admins / non-replicating people log in with. We would then set up live two way (unfiltered) replication between those two servers.

garethbowen commented 8 years ago

Ok, it seems to work. Assigning to @SCdF for a secondary code review. Sorry about the many little changes but it was the easiest way to get it running in a production like environment. You can try it out on lg-testing if you like.

SCdF commented 8 years ago

OK I've read the code and I'm reasonably sure I get the gist of it.

My only major concern is whether or not this affects the request size from CHWs in a meaningful way, and what that does to their bill. If most IDs are UUIDs (32-36 characters, plus commas and what-not) that would make 1000 ids ~30-40kb, which isn't an unreasonable number to expect. I presume compression sucks over a string of UUIDs (randomness and all that).

30kb is a small number, and I've only had one coffee so far today, but:

I presume continuous replication effectively issues a new changes feed request after every document? That would mean 30kb pushed up for every document down or up (since up comes back down…)
I could imagine that bad networks might involve us firing up a bunch of attempted change requests that may or may not be charged to the CHW?

garethbowen commented 8 years ago

Yes that was my biggest concern too. It does batch them together (up to 100) where possible, for example, if you've been offline for a while. But yes in the worst case the list of IDs will be sent for every document synced. This could be tweaked to only sync once an hour or similar which would optimise the batching at the expense of immediacy of data, but I'm comfortable with the way it is for now.

SCdF commented 8 years ago

Yeah, so perhaps instead of continuous replication, we could replace that with:

Try to do a once-off replication
If you 404, wait for fallback(n) seconds and try again (aka how continuous tries to connect now)
Once you get a hit, run through the whole replication, and once it's done wait for N minutes (eg an hour, 6 hours)
Once that time is up, GOTO 10

garethbowen commented 8 years ago

I don't really want phones to be potentially 6 hours out of date. And if N is too small then you may actually increase data usage by making lots of requests with an empty result.

I don't think the current patch will increase the data costs noticeably.

SCdF commented 8 years ago

Oh yeah, @garethbowen I missed this one: https://github.com/medic/medic-api/blob/develop/handlers/changes.js#L62

Does that actually work?

garethbowen commented 8 years ago

Define "work"? I haven't actually tested it... but that block should only be hit if someone manually crafts an invalid request, eg: they've added IDs they're not allowed to see, for which documents exist that aren't deleted.

I imagine it'll blow up horribly because it's not JSON but blowing up horribly is pretty much the plan.

SCdF commented 8 years ago

As in, I presume you're trying to generate a 403? Does just writing "forbidden" actually work? Or should you be doing something more like this: https://github.com/medic/medic-api/blob/develop/handlers/changes.js#L78

garethbowen commented 8 years ago

Ahh yes, well, no this code doesn't send a 403 response. This is because of the streaming nature of continuous feeds. By the time this code executes we have probably already sent headers to the client and therefore cannot set any more, like setting a status code.

So you'll just get a 200 with "Forbidden" and it's not very semantic but then nobody legitimate should be seeing it anyway...

sglangevin commented 8 years ago

It now take about 1 minute to fully load the app for a brand new user, which is the typical use case, and about 3.5 minutes to fully load the app with a fair amount of data, which is a good benchmark for users with migrated data. This is acceptable performance, though I think we should look further into how the amount of data relates to the loading speed as users with lots of data could encounter very slow speeds, particularly when reloading or opening the app after a hard close. More info here: https://github.com/medic/medic-webapp/pull/2398

medic / cht-core