juliomalegria / django-chunked-upload

Upload large files to Django in multiple chunks, with the ability to resume if the upload is interrupted.
MIT No Attribution
218 stars 73 forks source link

add new feature: multiple checksum algorithms support #11

Open voidrank opened 9 years ago

voidrank commented 9 years ago

hi, I want to add a new feature, If we want our service more efficient, we can use CRC. And someone cares checksum collision very much, he will choose sha384 if he want to rename his uploaded file with the checksum(it is said that it is more secure. http://security.stackexchange.com/questions/33108/why-does-some-popular-software-still-use-md5) So I suggest that django-chunked-upload should provide more checksum algorithms.

juliomalegria commented 9 years ago

Hi @voidrank, That sounds like a good idea! Feel free to implement it and open a pull request. If you're too busy, you could describe here how you'd implement it, and I'll try to do it in my free time (I have an idea already, but I'd love to hear yours).

Thanks for collaborating!

voidrank commented 9 years ago

Hi, However, I have a question about variable's name. I want to change all variables md5 to checksum including http post API. But it will cause some compatibility problems... and I don't know how to solve them(for example, it will make demo unable to work properly)... or is there a better solution?

juliomalegria commented 9 years ago

What I was thinking was that the user would post the "md5", or "crc", or "sha256", etc., and the server will search for any of those (from a [extendable] list of supported checksums), and raise an error if none of these are found (unless do_checksum_check is False). And then calculate that checksum and compare the results.

Something like:

# models.py
@property
def md5(self):
    # ...

@property
def sha256(self):
    # ...

# views.py
# class ChunkedUploadCompleteView...
    supported_checksums = ['md5', 'sha256', ...]
    do_checksum_check = True

    def checksum_check(self, checksum_alg, chunked_upload, checksum):
        if getattr(chunked_upload, checksum_alg) != checksum:
            # ...

    # def _post ...
        for checksum_alg in self.supported_checksums:
            if checksum_alg in request.POST:
                self.checksum_check(checksum_alg, chunked_upload, request.POST[checksum_alg])
                break
        else:  # Didn't find any checksum
            # raise No checksum found

I don't know if you get the idea ...

voidrank commented 9 years ago

That's a pretty good idea!! I am very pleased to implement it and I will start the work after next Thursday(I have a midterm that day QAQ). Could you wait for me?

juliomalegria commented 9 years ago

Yeah, sure! Thanks for collaborating!

voidrank commented 9 years ago

I have submitted a pull request. And I also update demo. Could you merge it? (I added some test for django command & admin which may be a bit dirty)

juliomalegria commented 9 years ago

Hey @voidrank, I'm so sorry I didn't reply anything about this. I've been super busy lately. I took a quick look today and looks very promising, although there are some small things to fix. I'll try make some time to review it thoroughly and I'll write the comments in the pull request. Thanks again for collaborating! :)

hackdna commented 7 years ago

Also, it would be nice to have an option to perform data integrity verification using just the file size. While this is not as robust as a checksum, it is a lot faster (see also #27).