kbaseattic / assembly

An extensible framework for genome assembly.
MIT License
12 stars 14 forks source link

object storage: fix memory usage issue and hangs #296

Closed sebhtml closed 9 years ago

sebhtml commented 9 years ago

The code uses the requests Python package to send HTTP requests. The Python requests library needs to load file content in memory when the files argument of the post function is used. In our systems, we think this is the cause of some hangs.

One possible workaround is to use requests-toolbelt which provide a class to generate the request body of the HTTP POST query. With requests-toolbelt, the multipart-encoded data is injected using the data parameter.

With this change, the upload command does not hang. Further, the memory usage is now constant.

Command tested:

$ arast upload --single SRR1061345_1.fastq

The md5 checksum is identical in the file system on the client and in the object storage file system.

$ md5sum SRR1061345_1.fastq f13b4842984fccf7324a6b82c34baa1b SRR1061345_1.fastq

var document = db.Nodes.find({"file.name": "SRR1061345_1.fastq"}).sort({"_id": 1}).limit(1).next() document.file.name SRR1061345_1.fastq document.file.checksum.md5 f13b4842984fccf7324a6b82c34baa1b

The date makes sense too, see below.

Date(document.created_on) Tue Feb 17 2015 21:17:34 GMT-0500 (EST) Date() Tue Feb 17 2015 21:17:38 GMT-0500 (EST)

Finally, I verified that the owner is correct in the object in the object storage system.

document.acl.owner 25123f7f-4fe9-4b62-86ab-0ad3308b0dea

db.Users.find({"uuid": "25123f7f-4fe9-4b62-86ab-0ad3308b0dea"}, {"_id": 0,"username":1}); { "username" : "boisvert" }

Signed-off-by: Sébastien Boisvert boisvert@anl.gov Link: https://github.com/kbase/assembly/issues/167

levinas commented 9 years ago

Several bigmem VMs are unreachable. I'll run a dev test when the network issue is resolved.