radical-collaboration / QCArchive

2 stars 0 forks source link

Share figures about MongoDB latency over WAN #10

Closed mturilli closed 5 years ago

mturilli commented 5 years ago

Andre did some measurements few years ago.

andre-merzky commented 5 years ago

I don't think I still have viable data - but can collect those.

andre-merzky commented 5 years ago

Note: share scripts

andre-merzky commented 5 years ago

I do not have the script nor the results anymore, I am sorry. I would like to convert this ticket into adding those measurements into RP again - we would benefit from this data anyway. I should be able to do that until next week.

andre-merzky commented 5 years ago

Measurements have been taken (insert, find; different bulk sizes; different latencies) - I am in the process of plotting. Sorry for the delay :/

andre-merzky commented 5 years ago

insert_remote insert_local find plot.txt mongo_insert_144.76.72.175.txt mongo_find_144.76.72.175.txt mongo_find_localhost.txt mongo_insert_localhost.txt

andre-merzky commented 5 years ago

This was measured directly in RP, in the embarrassingly simplistic way in which we use MongoDB. The code for insert is here; the code for the find is here - in the latter, we only times the first find call which does not filter for unit IDs.

I was not fully correct on the indexed fields - only the type and uid fields are indexed in our DB - the control field is changing multiple times at runtime, for individual entities, so that re-indexing is too costly (I think).

andre-merzky commented 5 years ago

I should add that latency to the remote DB was about 40ms for these tests, bandwith was about 8 MBit/s (steady state).

dgasmith commented 5 years ago

Wow! That is a pretty disturbing graph at first glance. Was this over HTTPS/HTTP/SSH?

We haven't run into anything like this, but we usually drop binary blobs over HTTPS to a server that has local access to the DB. I am kind of surprised at this behavior, I will add it to my TODO list to try a couple things for this.

andre-merzky commented 5 years ago

We tunnel sometimes through ssh, but not in this specific setup. TBH, I never considered looking closer into MongoDB performance. Back then when we decided to use it this way, it seemed consistent with other DBs we used before, in the sense that performance seemed to be optimized for low latency links, not for long latency ones, and I took this as a given. Also, the use of the DB in this manner was supposed to be temporary approach - albeit a long lasting one by now :-P

But tickled by your comments, I tend to agree that this might hint at some peculiarities - I would most likely still blame our code setup for that, though it might indeed be interesting to better understand how that comes to pass. But, we want to replace MongoDB for other reasons, too. Thus, while I agree on this being interesting, please don't spend too much time for the sake of optimizing this in our context.

I should mention that in the setup where I measured this, there are other RP components also accessing the same collection - I don't have a clear understanding of how MongoDB handles that. I am sure though that, in the vast majority of cases, only one component should ever perform updates at any point in time - the others are then only searching/reading: entity ownership passes from one component to the next, and unless an error condition etc occurs, only the component which owns an entity will update the records of that entity. We almost always bulk updates in the manner you see - but bulk sizes are often not as static as in this case (as we don't want to wait for too long for bulks to fill up).

HtH, Andre.