scitran / core

RESTful API
https://scitran.github.io
MIT License
18 stars 18 forks source link

Uploading files to a single container has an increasing performance impact #987

Closed kofalt closed 6 years ago

kofalt commented 6 years ago

While testing an upload of a large set of files, I noticed we have a noticeable latency increase the more files are added to a single container.

This is the latency as reported by UWSGI, POST .../files => generated ... bytes in ... msecs:

image

That's how far I got before my test got killed for running > 10 minutes.

More important is that ElasticSearch has been pegged at 60-80% CPU forever. As of writing this comment, it's been 34 minutes since I stopped adding files and it's still going. That means that this sort of usage has the potential to delay indexing of other, new data for quite awhile.

I have not ran a profile, but in a discussion with Megan our guess is the job rule processing. I don't know if this would be affected by making files first-class citizens (no ticket seems to explicitly track that, but it's coming). When removing all job rules, I had better results:

image

I don't think we anticipate more than ~100 files on a container, so this may not be something we have to worry much about, but it's good to know. We might want to consider adding an ops alert when a container reaches a certain number of files.

kofalt commented 6 years ago

Bonus chart for fun: 100 projects of 100 files each:

image