tus / tusd

Reference server implementation in Go of tus: the open protocol for resumable file uploads
https://tus.github.io/tusd
MIT License
3.08k stars 479 forks source link

Implementation of the Expiration extension #435

Open Acconut opened 4 years ago

Acconut commented 4 years ago

tusd does currently not support the Expiration extension and people have to rely on custom scripts or provider-specific approaches to clean up old uploads. This issue is intended to serve as a discussion place for a possible implementation of this extension. In the end, tusd should be able to do two things:

Sending the expiration time to the client should not be a though nut to crack. However, when looking at the actual clean up work, it could be hairy to find a solution for the various storage options that tusd supports:

One might ask, we do we need two binaries (tusd for accepting uploads and tusd-cleaner for clean up old uploads)? Why can't we integrate the cleaning into tusd directly? Well, if you have multiple tusd servers running in parallel for redundancy and load balancing, you don't want every one of these tusd instances to scan through your upload directory at the same time. It would be better to let this be handled by one server with a cron job.

On the implementation side we would need following:

These are just some ideas from me. I am open to hearing other thoughts and approaches!

DravitLochan commented 3 years ago

@Acconut Is this still open? I can pitch in for the design, discussion and implementation.

Acconut commented 3 years ago

@DravitLochan Yes, this task is still open. Any help is appreciated!

DravitLochan commented 3 years ago

@Acconut Shouldn't the overall file expiry and expiry of partial uploads be two different features? IMO TUS being a protocol (or a pseudo protocol), the overall file expiry should be out of the scope of TUS. That's something the applications should handle using a cron. Thoughts?

alter123 commented 3 years ago

@DravitLochan I think your point of keeping different expiry for completed and partial uploads can be finely grained to configuration which allows user to define

One another opinion I would like to add is to clean uploads based on max available system disk (e.g. If the disk consumed is >90% then it should automatically delete the older uploads/start rejecting the new uploads).

DravitLochan commented 3 years ago

One another opinion I would like to add is to clean uploads based on max available system disk (e.g. If the disk consumed is >90% then it should automatically delete the older uploads/start rejecting the new uploads).

Well, I am not sure of this as a feature for resumable-uploads.

Also as a backend dev, to start deleting the older uploads on disk hitting 90% seems like an unacceptable thing to me. I might start to lose some critical files and hence my end user will start to churn.

If you want to solve the above problem, I'd say the best way would be to have alerts in place for disk hitting 80, 85, 90 and 95%. In-fact if you run out of space, let the application handle it on it's end "due to high traffic, we are not allowing any new file uploads. We will be back up in no time".

alter123 commented 3 years ago

@DravitLochan Your point makes sense, and rejecting the uploads sounds like correct option, but not sure if we need implementation for this at the moment, @Acconut can clarify.

Implementation wise I think storing expiry in FileInfo struct makes sense, since every FileSystem can have different abstraction for the same.

Also I'm not sure how we can parse common flags for config etc across the binaries, since most of them will be same in tusd & tusd-cleaner

DravitLochan commented 3 years ago

@Acconut what do you think?

Acconut commented 3 years ago

TUS being a protocol (or a pseudo protocol), the overall file expiry should be out of the scope of TUS.

Yes, tus should not be concerned about the expiry of files after the upload is completed, but tusd can definitely be concerned about this. There are multiple areas where tusd offers more features than the tus protocol specifies, just because those are handy and fit well into tusd. So, I see not problem when tusd offers expiration of uploads and files in one convenient package. Does that make sense? :)

One another opinion I would like to add is to clean uploads based on max available system disk

Interesting idea, but I think that is outside of tusd's scope and is very application specific. This does not seem like a good fit for tusd, as @DravitLochan already mentioned.

rejecting the uploads sounds like correct option, but not sure if we need implementation for this at the moment,

You can already check the free space of the disk in the pre-create hook and reject uploads if the disk is too full. There is not additional implementation needed inside tusd.

Also I'm not sure how we can parse common flags for config etc across the binaries, since most of them will be same in tusd & tusd-cleaner

That should not be a big problem. The parsing and/or validation of flags can be extracted into a shared module, which is imported by both programs (if that makes sense in the case).

Gambitier commented 1 month ago

@Acconut

I am currently using the CreatedUploads event to push tasks to a queue for cleaning up files. This way, I don't have to worry about which instance of tus created the file. So far, this approach is working well for us. Do you have any feedback or comments on this approach?

for event := range handler.CreatedUploads {
        log.Printf("Upload %s created\n", event.Upload.ID)
        uploadedFilePath := hook.Upload.Storage["Path"]

        tusFiles := []string{
            uploadedFilePath,
            fmt.Sprintf("%v.info", uploadedFilePath),
        }

               // Add file cleanup task for tusFiles to message queue to be executed at the time of expiry
}
Acconut commented 1 month ago

That should be fine as long as the delay is large enough to make sure uploads are either finished or abandoned when the clean up occurs.