stavro / arc

:paperclip: Flexible file upload and attachment library for Elixir
1.16k stars 210 forks source link

Large memory consumption #105

Closed AndrewDryga closed 7 years ago

AndrewDryga commented 8 years ago

Right now BEAM process seems to require space of about 3x file size. Maybe you can suggest how to send file directly to S3 without reading it into memory? We deal with very large files (up to 2 Gb).

AndrewDryga commented 8 years ago

I guess problem is located here: https://github.com/stavro/arc/blob/4216ac80fab7da3d4956c6ff85938571a5a3703b/lib/arc/storage/s3.ex#L76

You shouldn't read whole file into memory when you have file streams and binary reads :(.

stavro commented 8 years ago

Looks like we can likely use https://hexdocs.pm/ex_aws/1.0.0-beta1/ExAws.S3.html#upload_part/6 to upload in chunks.

stavro commented 8 years ago

Can you try the ex_aws_beta branch and let me know if it works better for you?

It has some quirks with error handling, but a large file upload should work better.

Samorai commented 7 years ago

Hi. We tried the ex_aws_beta but have got an error
[error] Task #PID<0.1162.0> started from #PID<0.1161.0> terminating ** (MatchError) no match of right hand side value: %{body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<InitiateMultipartUploadResult xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"><Bucket>os-dev-batch-uploads</Bucket><Key>vivus.pl/2016/09/15/vivus.pl-47c6122ce5a2567d7709c75eb30f33d319330bf6.bson</Key><UploadId>X25x.4KcNAoY0e27807dJcgq_tny3f9Wf1M1Or5t4yuysKEVv7qzu7.szDlRWEuUjUWGURiXdrt0zaWj.yUt7C00HkEMW8OMebQ2s6J8ByURB3kwbYrcyyy95eCUA6kS</UploadId></InitiateMultipartUploadResult>", headers: [{"x-amz-id-2", "N1XDqA+ham+b3vVwSQYNTqCkEMWAY8KrDIy5M6W9ML895wKNAB5A6GP95Fx8DMjmQvOgZQCDYgo="}, {"x-amz-request-id", "29BF433D73AD79C0"}, {"Date", "Thu, 15 Sep 2016 08:40:38 GMT"}, {"Transfer-Encoding", "chunked"}, {"Server", "AmazonS3"}], status_code: 200} (ex_aws) lib/ex_aws/s3/upload.ex:41: ExAws.S3.Upload.initialize!/2 (ex_aws) lib/ex_aws/s3/upload.ex:82: ExAws.Operation.ExAws.S3.Upload.perform/2 (ex_aws) lib/ex_aws.ex:41: ExAws.request!/2 (arc) lib/arc/storage/s3.ex:48: Arc.Storage.S3.do_put/3 (elixir) lib/task/supervised.ex:94: Task.Supervised.do_apply/2 (elixir) lib/task/supervised.ex:45: Task.Supervised.reply/5 (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3 Function: #Function<0.113666932/0 in Arc.Actions.Store.async_put_version/3> Args: []

Something wrong with ex_aws library, yes? Debug shows that ExAws.S3.Parsers.parse_initiate_multipart_upload/1 does not works correctly.

jfrolich commented 7 years ago

Same error here. Also trying to get large uploads working. Looking into the code.

stavro commented 7 years ago

The latest ExAws wasn't ready yet for file streaming when I tried. If you want to try updating that branch and see if it helps, I would appreciate it!

jfrolich commented 7 years ago

Hey @stavro. As far as I tested this, it worked great with the current version of ex_aws! I am still doing some testing with really large files.

ex_aws however silently fails and returns raw XML when you do not have the sweet_xml library installed. (As error above). I pull-requested a clearer error trigger when sweet_xml is not present.

stavro commented 7 years ago

Amazing. Should this library then require sweet_xml?

I thought AWS had a way of requesting errors in JSON. If it's possible to request errors in JSON we should totally push that down to ExAws!

jfrolich commented 7 years ago

Good one. It shouldn't be too involving to change that, as the parser in ex_aws is just a module, it should be easy to create a json parser. Anyway this works for now :)

Probably best to include sweet_xml in this library indeed for using it with ex_aws 1.0.

jfrolich commented 7 years ago

To confirm, no trouble in uploading 500+mb files on a small heroku dyno.

stavro commented 7 years ago

Amazing! Thanks for looking into it. I'll update the ex_aws_beta branch and start adding documentation about it next week. If you run into any other quirks please let me know. As soon as ExAws is out of beta we'll merge it into here. 🎉

jfrolich commented 7 years ago

Cool, I run it in production, because we need to support large file sizes. But it's ok because we have tests, and it seems to work fine 💃 Will report any issues that come up.

AndrewDryga commented 7 years ago

We will use ex_aws_beta brunch in production, so It would be very awesome if you will notify us here when you will release next version of hex package, so we will stop using github repo dependency :).

jfrolich commented 7 years ago

Same here, it might be good to publish a beta release?

stavro commented 7 years ago

Will do on Monday or Tuesday. Thanks for helping test everyone!

On Sep 30, 2016 11:50 PM, "Jaap Frolich" notifications@github.com wrote:

Same here, it might be good to publish a beta release?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stavro/arc/issues/105#issuecomment-250896578, or mute the thread https://github.com/notifications/unsubscribe-auth/ACR-IK3qA81g_9kAqVTkZ-9Z4lcYAxdrks5qvgKigaJpZM4J0V3C .

stavro commented 7 years ago

Released arc as v0.6.0-rc1 to track ExAws.

Please try and report any feedback. Thanks!