noobaa / noobaa-core

High-performance S3 application gateway to any backend - file / s3-compatible / multi-clouds / caching / replication ...
https://www.noobaa.io
Apache License 2.0
269 stars 78 forks source link

Performance drops when we have 30K parts #604

Closed tamireran closed 8 years ago

tamireran commented 8 years ago

Allocate part takes 2.5seconds in this case, and the entire upload time is about 10% of the possible time.

guymguym commented 8 years ago

0.3.7? On Nov 12, 2015 23:47, "Eran Tamir" notifications@github.com wrote:

Allocate part takes 2.5seconds in this case, and the entire upload time is about 10% of the possible time.

— Reply to this email directly or view it on GitHub https://github.com/noobaa/noobaa-core/issues/604.

tamireran commented 8 years ago

yep

On Thu, Nov 12, 2015 at 11:54 PM, Guy notifications@github.com wrote:

0.3.7? On Nov 12, 2015 23:47, "Eran Tamir" notifications@github.com wrote:

Allocate part takes 2.5seconds in this case, and the entire upload time is about 10% of the possible time.

— Reply to this email directly or view it on GitHub https://github.com/noobaa/noobaa-core/issues/604.

— Reply to this email directly or view it on GitHub https://github.com/noobaa/noobaa-core/issues/604#issuecomment-156247862.

guymguym commented 8 years ago

in the mapper function find_consecutive_parts() I used a query that is supposed to hit the part index, and it does hit the index, but since the index only includes start, and not end, then the scanning actually continues over all the next ranges and the scan can indeed become long.

I considered if to add end to the index, but I have a better solution - include start:{$lte: <range-end>} to the query, that will limit the scanning only to start indexes that are in the bounds as well, and will use the index in both ends of the range.

so instead of this -

db.objectparts.find({ system:ObjectId("5637823ce21bd44f20d3bcc7"), obj:ObjectId("563a0259549c59cc6a779c22"), start:{$gte:0}, end:{$lte:10000000}, upload_part_number:0, deleted:null }).explain()

the query will be this -

db.objectparts.find({ system:ObjectId("5637823ce21bd44f20d3bcc7"), obj:ObjectId("563a0259549c59cc6a779c22"), start:{$gte:0, $lte:10000000}, end:{$lte:10000000}, upload_part_number:0, deleted:null }).explain()

I tested this on my small DB and with explain() I could see that this change made it efficiently reduce the scanning to only the relevant parts.

will add this fix soon to 0.3.7 and 0.4.0.

guymguym commented 8 years ago

we have more goats like this hiding in calc_multipart_md5() and fix_multipart_parts() which are also called on every upload so will slow significantly with large number of parts. trying to fix them too.

tamireran commented 8 years ago

fixed main problem