tus / tusd

Reference server implementation in Go of tus: the open protocol for resumable file uploads
https://tus.github.io/tusd
MIT License
2.93k stars 465 forks source link

Many post-receive hooks after post-finish hook #1095

Open registiy opened 3 months ago

registiy commented 3 months ago

Question I'm trying to create progress meter for file uploader and using TUS http hooks for that matter. On my local machine's port 8000 there is a python app, where I receive hooks and update database accordingly (tusd, database and app is on my localmachine).

In the app's log I see that "post-finish" hook is always followed by many "post-receive" hooks. And they are going almost instantly:

2024/03/12 00:23:39.732375 level=DEBUG event=HookInvocationFinish type=post-receive id=9
2024/03/12 00:23:39.776973 level=DEBUG event=HookInvocationStart type=post-receive id=9
2024/03/12 00:23:39.787741 level=DEBUG event=HookInvocationFinish type=post-receive id=9
2024/03/12 00:23:39.826289 level=DEBUG event=HookInvocationStart type=post-receive id=9
2024/03/12 00:23:39.843333 level=DEBUG event=HookInvocationFinish type=post-receive id=9
2024/03/12 00:23:39.869038 level=DEBUG event=HookInvocationStart type=post-receive id=9
2024/03/12 00:23:39.885667 level=DEBUG event=HookInvocationFinish type=post-receive id=9
2024/03/12 00:23:39.905278 level=DEBUG event=HookInvocationStart type=post-receive id=9
2024/03/12 00:23:39.914368 level=DEBUG event=HookInvocationFinish type=post-receive id=9
2024/03/12 00:23:39.944836 level=DEBUG event=HookInvocationStart type=post-receive id=9
2024/03/12 00:23:39.955878 level=DEBUG event=HookInvocationFinish type=post-receive id=9
2024/03/12 00:23:39.969410 level=DEBUG event=HookInvocationStart type=post-receive id=9
2024/03/12 00:23:39.980200 level=DEBUG event=HookInvocationFinish type=post-receive id=9
2024/03/12 00:23:40.008805 level=DEBUG event=HookInvocationStart type=post-receive id=9
2024/03/12 00:23:40.025901 level=DEBUG event=HookInvocationFinish type=post-receive id=9
2024/03/12 00:23:40.049101 level=DEBUG event=HookInvocationStart type=post-receive id=9
2024/03/12 00:23:40.065275 level=DEBUG event=HookInvocationFinish type=post-receive id=9

Is there any way to stop sending "post-receive" after "post-finish" has been sent? It's not kind of big problem now, but in case of many simultaneous uploads I would have to use cache or change my upload algorithms, because every http request to my backend makes request to database, and that is not good.

From my current knowledge it is easier to turn off sending, than changing the way my backend works.

If I'm wrong, correct me please.

Thank you!

Setup details

Acconut commented 3 months ago

In the app's log I see that "post-finish" hook is always followed by many "post-receive" hooks. And they are going almost instantly:

That's odd and should not be happening in the first place, but I am not able to reproduce this locally. Are you able to reproduce this consistently? Could you also share the entire logs for the upload?

registiy commented 3 months ago

local_tusd.log local_fastapi.log k8s_tusd.log

So, local_tusd and local_fastapi these are logs when I load file locally. k8s_tusd is from production deployment, you can see two post-receive hooks after post-finish.

Acconut commented 3 months ago

Thank you, I will look into this in two weeks.

Acconut commented 3 weeks ago

I feel like you have two different questions here:

Why are there so many post-receive events?

In local_tusd.log we see many post-receive events that are executed at nearly the same time:

2024/03/26 00:46:28.016805 level=DEBUG event=HookInvocationFinish type=post-receive id=14
2024/03/26 00:46:28.033906 level=DEBUG event=HookInvocationStart type=post-receive id=14
2024/03/26 00:46:28.059759 level=DEBUG event=HookInvocationFinish type=post-receive id=14
2024/03/26 00:46:28.086797 level=DEBUG event=HookInvocationStart type=post-receive id=14
2024/03/26 00:46:28.116238 level=DEBUG event=HookInvocationFinish type=post-receive id=14
2024/03/26 00:46:28.139646 level=DEBUG event=HookInvocationStart type=post-receive id=14
2024/03/26 00:46:28.172106 level=DEBUG event=HookInvocationFinish type=post-receive id=14
2024/03/26 00:46:28.185443 level=DEBUG event=HookInvocationStart type=post-receive id=14
2024/03/26 00:46:28.214986 level=DEBUG event=HookInvocationFinish type=post-receive id=14
2024/03/26 00:46:28.227101 level=DEBUG event=HookInvocationStart type=post-receive id=14
2024/03/26 00:46:28.250913 level=DEBUG event=HookInvocationFinish type=post-receive id=14

The reason is that you are sending PATCH requests at a high rate, one about every 50ms, which is likely possible because you are testing on a local network with low latency and high throughput. Anyways, a post-receive event is emitted for every PATCH request at the specified interval (1s in your case) and whenever the request ends, so that for every PATCH request we have at least one post-receive event. However, this also means that when you send a PATCH request every 50ms, you will also get the same amount of post-receive events.

I recommend you to reconsider whether you need to send many small PATCH requests versus one big PATCH requests. That is usually the more efficient way to go and would also reduce the number of post-receive events.

Why are there post-receive events after post-finish?

The last post-receive event is emitted whenever we finish processing a PATCH request, which is also often after we have emitted post-finish. This would explain why we have post-receive after post-finish. I am not sure if there is much we can do about this. At the day of they day, they are asynchronous events, so we cannot guarantee their ordering.

I hope this helps!

Acconut commented 3 weeks ago

Given the information about the PATCH request rate I can also reproduce this locally by uploading to a server running on the same machine with a PATCH request body size of 20MB. There we can see that there is a 1s delay between when the first PATCH request finishes and the first post-receive event is emitted. Some delay is expected, but it would nice if it were lower.

Acconut commented 3 weeks ago

I also noticed that we send HTTP hook request with a concurrency of 1, which is the default for the HTTP client we are using:

https://github.com/sethgrid/pester/blob/32a1beba19bdd8dd2925f3122c3c2f9bd92c5698/pester.go#L139

I wonder if we should increase this. However, it doesn't seem to affect the behavior we are seeing here in this issue.