juliomalegria / django-chunked-upload

Upload large files to Django in multiple chunks, with the ability to resume if the upload is interrupted.
MIT No Attribution
214 stars 71 forks source link

Added temporary storage via settings flag #39

Open GitRon opened 6 years ago

GitRon commented 6 years ago

If you use AWS S3 the default storage is not working. The chunks cannot be created and updated.

I added the possibility to activate cross-platform temporary storage. I cannot use CHUNKED_UPLOAD_STORAGE_CLASS because I need to pass the location paramter to the storage.

This is not working.

I decided not to change the way CHUNKED_UPLOAD_STORAGE_CLASS works for better compatibility.

GitRon commented 6 years ago

@juliomalegria Please have a look at my PR and consider merging it. In combination with django-storages 1.6.6 and boto3 to works for Amazon S3.

GitRon commented 6 years ago

@juliomalegria Any chance of getting this in the master branch and release a new version?

GitRon commented 6 years ago

No update so far?

GitRon commented 4 years ago

@jerinpetergeorge I fixed the merge conflicts. Whats your opinion on this topic?

jerinpetergeorge commented 4 years ago

I have tried the new changes in my local machine. Unfortunately, it's not uploaded to S3

Step to reproduce

Part-1

  1. Download the demo project and supporting files from Here
  2. Extract the zip
  3. create a virtual environment and install requirements.txt (our package not included in requirements.txt since I've added the relevant modules added to the demo)
  4. go to django-chunked-upload-demo/settings.py and fill the following keys,
    1. AWS_ACCESS_KEY_ID
    2. AWS_SECRET_ACCESS_KEY
    3. AWS_STORAGE_BUCKET_NAME
  5. apply migrations
  6. create a superuser
  7. run the Django development server

Part-2 ( checks whether default s3 media/file upload working or not)

  1. go Django admin
  2. Create a new instance of FileUploadTest instance by uploading a file
  3. Checks S3 bucket after instance creation [I was able to successfully upload file to S3 bucket]

    Part-3 ( checking CHUNKED_UPLOAD_USE_TEMP_STORAGE settings)

  4. got to root URL
  5. upload any file
  6. Checks S3 bucket after instance creation [upload completed without any errors, But, the file was not uploaded to S3 bucket]

Did I miss something? @GitRon

jerinpetergeorge commented 4 years ago

@GitRon I think we should initiate the S3 upload using upload_file() method when we receive the last chunk from the client. But, if we try to upload the whole file to S3 at that moment, probably the NGNIX would close the connection from the browser client.

and there is another method called, upload_part(), it seems like we could upload the chunks in realtime to the S3 bucket.

GitRon commented 4 years ago

@jerinpetergeorge I'm sorry but I cant follow you. We are using boto3 as a connector and its file storage class. So you dont have to worry about the upload at all. The only thing what you need to take care about is that you cannot write the chunks to the target location.

jerinpetergeorge commented 4 years ago

Oops :persevere:
Have you tried to reproduce the behavior that I mentioned here ??

Anyway, let me ask you some doubt regarding this, that may help you to understand what I am trying to say

  1. What are the possible values of CHUNKED_UPLOAD_USE_TEMP_STORAGE?
  2. If CHUNKED_UPLOAD_USE_TEMP_STORAGE only support boolean values, (ie True and False)
    1. how can I/ where should I specify my s3 target location?
    2. after the successful completion of chunk upload to the temporary location of the server (the place where the server currently running), how the file will get uploaded to the S3? Should I do it manually? Will Django handle it internally?

OR,

Could you provide a minimal example that demonstrates the S3 upload? @GitRon

GitRon commented 4 years ago

Ok, I think we are talking not about the same thing here 😅

My changes are not about ENABLING cloud storag uploads. Its just prerequisite.

If you are writing to a file storage like a hard drive, it does not matter where you park the chunks. On an object storage it does because you cannot write to an object storage. It only accepts full files as an object.

I gave it a second thought and maybe we should always - without the settings flag - write the chunks to the temp storage. I think this is the main reason why there is a temp-dir, right? To contain temporary data.

What do you think?

jerinpetergeorge commented 4 years ago

Storing the chunks to a temporary location and moving it to the target location after completion is a good idea. I think we should implement this first before the S3 thing. Apart from that, If someone using s3 storage as their default storage system, they would expect the file to be uploaded to the S3 bucket. So we should implement a full-fledged s3 upload capability here.

GitRon commented 4 years ago

Perfect, I'll change my PR then.

GitRon commented 4 years ago

@jerinpetergeorge Ok, the behavior is now as follows:

If you define no storage, it takes my temporary storage. But you can override it with your settings var if you like.

I like this solution, good that we talked about it 👍

jerinpetergeorge commented 4 years ago

Great! but this is only half of the solution. We should move the completed file to CHUNKED_UPLOAD_PATH when a user uses the default settings, shouldn't we? @GitRon

GitRon commented 4 years ago

@jerinpetergeorge Sound good. But doesn't it already work like this? I just collect the chunks in temp storage.

GitRon commented 3 years ago

@jerinpetergeorge Somebody asked for my fix on this issue. Whats your status?

I pointed out - after thinging about the things that happended - that me might want to avoid renaming variables etc. To maybe make it stable and compatible again? What do you think? 😃

jerinpetergeorge commented 3 years ago

Sorry for the utter delay from my side. Thanks for your effort @GitRon and please do let me give a couple of days to recollect the project context ( :astonished: ) and review the PR as I am completely slipped away from this.

jerinpetergeorge commented 3 years ago

@GitRon Thanks for the PR. But, It is hard for me to review the PR since I have slipped away from the context, as I already said. Also, I don't think I can make it to work in the near future.

So, I would recommend you to ask @juliomalegria to make you a collaborator of this repo or fork this project and maintain it yourself.

GitRon commented 3 years ago

@jerinpetergeorge @juliomalegria Sure, if you give me permission I'd fix the bug people were complaining about. Would you be able to deploy a new version at the time?

jerinpetergeorge commented 3 years ago

I won't be able to do that :disappointed: You can contact @juliomalegria via his personal mail (juliomalegria@gmail.com got from his personal site). If he didn't respond within a week, better you fork the project.

FYI: There is a place called jazzband, and I think we can move this project there and which will help to get more exposure. There are some guidelines to make it work, jazzband guidelines, please have a look.

@GitRon

GitRon commented 3 years ago

@jerinpetergeorge I know, you were just in CC. But good idea with the personal email address! Thx!

And good idea for jazzband as well!

GitRon commented 3 years ago

@jerinpetergeorge I wrote him. We'll see what happens.

cr0mbly commented 3 years ago

Any luck with this, I echo the use case that chunks being uploaded to S3 directly would be preferential.

GitRon commented 3 years ago

@cr0mbly I am not sure if this is possible and wanted. S3 is not a file system but an object storage. So you'd expect the full file as an object, not some bunch of chunks.

Furthermore: What happens if something goes sideways during the upload? Then you have to clean up some leftover chunks?

IMHO the local temp approach is the cleaner solution. 😃

cr0mbly commented 3 years ago

@cr0mbly I am not sure if this is possible and wanted. S3 is not a file system but an object storage. So you'd expect the full file as an object, not some bunch of chunks.

Yep thanks for this. I just put some work in on my own to integrate S3 as the default storage and came up against the fact that we can't append file to an S3 object. This is fine for small files as you can just keep updating the file during the backend but when you get larger files loading and downloading the full object into memory becomes less and less preformative.

Just for anyone else looking at this with autoscaling setup we left it as tempstorage and used EFS to maintain persistence of the filesystem during chunk upload.