Open zazz-ops opened 5 months ago
This appears related to #4350
@zazz-ops Can you try disabling transactions in mongodb?
transactionOptions: false
https://payloadcms.com/docs/database/mongodb#options
You are connected to a replica set, correct?
@BrianJM Disabled transactions and still getting the same error:
Interestingly enough, the file actually does upload successfully to my S3 bucket:
However, I'm still getting the mongo error and there is no doc for the uploaded file in the DB.
@BrianJM And yes, connecting to a replica set. I'm just using the free-tier MongoDB Atlas M0 deployment.
@zazz-ops I started getting transaction errors after moving from a local instance to an Atlas (M0) replica set as well.
It's interesting that disabling transactions still produces a transaction error; in my case, it does resolve the issue but I'm not doing large uploads.
~Do you see a pattern with the timing of transaction errors? I realize this may depend on network conditions, but does it occur after 60 seconds (as an example) from the start of the upload?~ I see you noted approximately 3 minutes.
@BrianJM Yes, the error typically fires after ~3 minutes. But the interesting thing is that it always fires (immediately?) following the completion of the file upload. The video file I'm using for testing (~180Mb) takes ~3 minutes to upload on average.
So the timing of the error seems to be directly related to the duration of the upload.
@BrianJM I just did some more testing with a 21Mb file and an 80Mb file.
The 21Mb file uploaded fine and produced no errors. Payload performed as expected.
The 80Mb file produced the same errors as the original 180Mb file. The file itself landed fine in my S3 Bucket, but the MongoDB error was thrown, and there's no correlating doc in my media
collection.
How do we figure out what the partSize
is that Payload is using for its S3 uploads? Seems to me that it might be something like 50Mb, and so the 21Mb file is fine, but the 80Mb file is not because it needs to move on to the next part of the upload, but that (for some unknown reason) ends up throwing this MongoDB error.
How do we figure out what the
partSize
is that Payload is using for its S3 uploads? Seems to me that it might be something like 50Mb, and so the 21Mb file is fine, but the 80Mb file is not because it needs to move on to the next part of the upload, but that (for some unknown reason) ends up throwing this MongoDB error.
@zazz-ops You beat me to it. That's part of the problem, I think. The partSize
is 50 MB and the queueSize
is 4. Each part must finish within 2 minutes, or the upload will timeout.
What's your upload speed? Are you hosting local or in a data center? You will need at least 15Mbps upload to finish a 200MB upload in 2 minutes.
The partSize
definitely needs to be reduced with a queueSize
of 4.
Could it be the transactionLifetimeLimitSeconds?
The transaction will initialize at the start of the request, but it won't commit until after the file completes. I think this is the underlying issue and something we need to consider.
The upload code is written this way, because if you were trying to upload a file and it fails, we wouldn't want it to have a collection document. It sounds like we need to rethink this approach and introduce an intermediate db commit with a delete cleanup for failing uploads.
I thought it may be related to this, depending on the upload speed.
I am testing large uploads with the recent fixes on main, and trying to replicate in 2.11.1. I think you're right about transactionLifetimeLimitSeconds
- I'll test that as well.
@DanRibbens I can reproduce an error with large uploads, with and without transactions.
I think you're right about re-thinking the approach regarding transactions - that will also be necessary - but there is a deeper issue here with S3 timeouts.
With transactionOptions: false
, the following errors occur (one per upload).
Error: You are not allowed to perform this action.
or[22:28:05] ERROR (payload): Error: No files were uploaded.
I'm not sure why zazz-ops received a different result, but transaction errors should not occur if disabled.
With transactions enabled, the error occurs as reported by @zazz-ops.
Here is how to reproduce the issue, with or without transactions:
Network
tab of dev tools, create a 5 Mbps
profile. > 200 MB
. 5 minutes
and review the request in the Network
tab.
Note: 6 Mbps and 200 MB will not cause an error. The upload will succeed as shown below.
I believe resolution is to allow the S3 timeout
to be configured.
Allowing the multipartUpload
/partSize
and queueSize
(file) may also be useful. The maximum memory consumption is queueSize * partSize
, so it may be better to have smaller parts spread across more queue slots in some environments (e.g., 4 * 50 MB = 200 MB
vs 10 * 5 MB = 50 MB
)
Changing the multipartUpload
to 5 MB
(from 50 MB
) does not resolve the issue; I still receive an error ([00:37:40] ERROR (payload): Error: No files were uploaded.
).
I did not get as far as testing extending or removing the timeout
.
Update: fetch
appears to timeout after 5 minutes (waiting for response headers). This explains longer timeouts without transactions.
@BrianJM My upload speed is slow, typically it's about 5Mb. So that's likely contributing to the failure of the upload, but it shouldn't matter. The whole point of multipart uploading is resiliency at any upload speed. So Payload absolutely needs to tackle this. It's not optional for a CMS — even one that's so much more than just a CMS.
@DanRibbens I suspected that Payload was starting a DB op upon the init of the upload and then waiting for the upload to complete. And it seems you've already figured out that this is not a tenable strategy for an uploader. A 5Gb file could take hours (or even days) to upload depending on the user's upload speed, so keeping a DB op running for that long is ... you already know.
So ... fun fact: I've built the uploader I need 5 times already for previous projects. So I'll likely reach for the latest bespoke version I wrote and see if I can make Payload work with it by using a media
collection that does not have uploads enabled.
Here's a quick sketch of how I've implemented a highly-resilient "large" file uploader before:
client
(UI, CLI, node.js, etc...) sends a CreateMedia
request to the API with the file
details, but not the actual file
data.media
document in the db with a status: 'uploading'
property. And this is also the step where the API may use AWS STS (or equivalent) to generate temporary creds and pass them back to the client
as an "upload token", thus scoping the user's write privs to that bucket down to only the file they're uploading.Upload
from the @aws-sdk/lib-storage
package.Upload
uploads the file to S3 and provides progress events. Progress percentage is reported to the user.Upload
is finished uploading the file, or if it fails to upload the file, it's the client
that receives those event notifications and then fires against the API to "finalize" the upload by updating the doc in the media
collection with a status: 'complete'
or status: 'failed'
update.media
docs that have a status: 'failed'
property.This is obviously a high-level sketch, but it's the gist of what I've implemented in the past for the bespoke CMS or DAM systems I've built.
Payload handles CRUD, ACL, Hooks, Globals, and Plugins better than what I've built, so for this new project I'm working on I'm currently evaluating whether I should modify the codebase I've already written to suit the project, or if I should reach for Payload and leverage all of its amazing capabilities.
I was hoping Payload's DAM functionality would Just Work™ as I need it to ... not the case it seems.
@zazz-ops I don't think this can be resolved without using WebSockets.
Network request timeouts (waiting for server response headers) are defined by the browser. Chrome is 300 seconds. Firefox is 90 seconds.
With the current implementation, the browser imposes an upper limit.
@BrianJM I'm not entirely sure what you mean by "this can be resolved", but the uploader I described in my previous comment very much works. It's in production on 5 systems I've built and has been for years. And it handles "large" file uploads (typically videos) every day.
You're mentioning WebSockets, so I'm assuming that you mean a socket connection between the UI and the Payload API. But that's not how the uploader I've described works. It bypasses Payload entirely and uploads directly to S3 from the browser, or CLI, or any runtime.
This requires that Payload create a client
that leverages the @aws-sdk/lib-storage
Upload
library. And a client would need to be created for many contexts: browser (JS), CLI (JS, TS, GO, PYTHON, RUST, etc...), runtime (JS, TS, GO, PYTHON, RUST, etc...).
In this case Payload API gives up all functionality/responsibility for uploads. It's the client
that co-ordinates the steps. My Payload media
collection is just any old data collection. All upload and resizing or transcoding/transformatting functionality is lost and alternatives (like AWS Elemental Media Convert and Blitline) need to be leveraged.
As I mentioned in my last comment, the uploader I sketched is very much a sketch. When you factor in digital asset access/security, transcoding/transformatting, CDN cacheing/security, licence-based-rights access, and more ... it gets pretty hairy pretty fast.
At this point, from Payload's perspective, it's very safe for you to assume you're nowhere near where you need to be with uploads and DAM. You actually have a long way to go to catch up to Wordpress.
I'm going to work on implementing my uploader as a separate entity in my system that simply fires against the Payload API and keep evaluating Payload only because the core functionality is so elegant.
But I can tell you with certainty right now, the current Upload and DAM functionality of Payload is essentially useless for anything other than a brochure site. Maybe even not that.
@BrianJM I'm not entirely sure what you mean by "this can be resolved", but the uploader I described in my previous comment very much works. It's in production on 5 systems I've built and has been for years. And it handles "large" file uploads (typically videos) every day.
@zazz-ops Which uploader is this and what type of connection is established?
You're mentioning WebSockets, so I'm assuming that you mean a socket connection between the UI and the Payload API.
That's one way, but I assumed direct uploads from the browser to S3 uses a socket connection as well. I believe this is generally how large uploads work.
But that's not how the uploader I've described works. It bypasses Payload entirely and uploads directly to S3 from the browser, or CLI, or any runtime.
This requires that Payload create a
client
that leverages the@aws-sdk/lib-storage
Upload
library. And a client would need to be created for many contexts: browser (JS), CLI (JS, TS, GO, PYTHON, RUST, etc...), runtime (JS, TS, GO, PYTHON, RUST, etc...).
I understand the concepts.
What I do not know is the method the browser uses to upload (with lib-storage). Do you? Is it a socket connection?
Can you can share a repo that implements direct uploads from the browser using the library?
At this point, from Payload's perspective, it's very safe for you to assume you're nowhere near where you need to be with uploads and DAM. You actually have a long way to go to catch up to Wordpress.
WordPress media uploads are also bound to server timeouts. Have you been using the WordPress media library to upload and host 5 GB files?
But I can tell you with certainty right now, the current Upload and DAM functionality of Payload is essentially useless for anything other than a brochure site. Maybe even not that.
That's an interesting opinion. You are welcome contribute.
@BrianJM
@zazz-ops Which uploader is this and what type of connection is established? https://github.com/aws/aws-sdk-js-v3/tree/main/lib/lib-storage
That's one way, but I assumed direct uploads from the browser to S3 uses a socket connection as well. I believe this is generally how large uploads work.
I'm not really aware of any uploaders that do so via a socket connection. The AWS lib-storage
Upload
class linked above uses an XHR PUT request so far as I can tell:
What I do not know is the method the browser uses to upload (with lib-storage). Do you? Is it a socket connection?
See above.
Can you can share a repo that implements direct uploads from the browser using the library?
I can't share any of the ones I've built as they are proprietary, but the Upload
class of the lib-storage
package is an example itself. The uploaders I built are all have lib-storage
as a hard dependency. It (obviously) handles the actual transfer of bytes to S3, and then everything else I wrote is all the "meta logic" that goes with an upload like filename cleansing, status tracking, transcoding/transformatting, etc.
WordPress media uploads are also bound to server timeouts. Have you been using the WordPress media library to upload and host 5 GB files?
No, the largest files I've had to push up to Wordpress capped-out at about 500Mb, but it's not outside reason to think that WP could handle 5Gb with some tweaking of the PHP and Nginx or Apache config. But generally WP offers DAM functionality that is far superior to where Payload is at right now. The point being: even WP is doing a better job at this than Payload, perhaps it should be an area of special focus to get this sorted out given that DAM is a critical part of a CMS.
@zazz-ops I believe the AWS SDK uses a socket connection for multipart uploads.
I agree that it is logical for S3 uploads to bypass Payload / Node. I planned to develop a component or plugin to do this (to save egress fees), but this limitation may result in core logic changes (so I won't need to do that).
The WP Media library is subject to the same limitation you're seeing today in Payload. The maximum upload is bound by the sever configurations and the browser timeout.
Do you have open discussions or issues regarding the areas you feel are lacking in DAM?
Hey @zazz-ops , it sounds like you have a pretty good idea of the problem you're facing. We'd be open to a PR that would handle these scenarios if you feel up to it. Let me know.
The alternative is to create a custom adapter that has your desired functionality and pass that in. Depending on your needs, this might be a viable option.
I second the suggestion of uploading directly to S3 and have built file uploads in a similar way before aswell.
When initiating a file upload, the API would call S3 to create a presigned url for the upload and return that to the client who would then do a mutipart request directly to that. Neither a request to payload nor a DB should need to stay open during that time and rather as @zazz-ops pointed out, a separate call in the end should mark the upload as completed.
Besides handling of large file sizes and avoiding a roundtrip to the API, another main factor would be to keep the Payload part of this stateless for serverless or otherwise distributed environments.
I think this pattern of requesting a file upload and returning an url to upload to should be the default implementation for file uploads and not specific for S3, even for basic storage options where the same host that receives the request for uploads also handles the actual upload, so we have said flexibility baked into the Admin UI and API.
If I’m not missing something, just to make sure we‘re on the same page, the client/browser facing part shouldn’t have any requirement for an AWS specific or otherwise lib to handle the upload and usually would not need to know about the implementation details of the storage provider, although potentially there might be other storage backends where that could matter.
Creating a pre signed upload URL could be built on top of our current uploads implementation under a new feature flag or perhaps as an alternate adapter in the plugin. I see this as a valid request. It would be of great help if this was built by the community.
If the upload request could be redirected from payload to the pre signed address then a feature flag wouldn't be necessary, it would just happen automatically from the browser to the s3 URL. I would have to do more digging on this pattern to know the details on what's possible here.
It would be of great help if this was built by the community.
Happy to help with this, just need to find the time to get more familiar with the Payload plugin structure but that's on my ToDo anyways :)
If the upload request could be redirected from payload to the pre signed address then a feature flag wouldn't be necessary, it would just happen automatically from the browser to the s3 URL.
Could you elaborate further on that? I'm not sure yet I got under which circumstances a feature flag would be required and when not. But yes, the implementation I'd intend to build would let the API call S3 to create a presigned url and return that to the browser, which would upload the files directly to that. Depending on wether the existing behabviour should also be kept intact, a feature flag might still make sense. However, I guess that will all be more clear once we have a PR ready and a decision could be made at that point.
A big +1 for the ability to use AWS S3 pre-signed upload URLs - bypassing the API and server for the actual upload. We have a fast connection to our server, and a reverse proxy behind Nginx. Large uploads are working for us at the moment (with a serverless Atlas MongoDB). But... if we wanted to deploy to Vercel we would hit their hard upload size limit of about 4MB (IIRC). In either case, a direct client upload via a signed URL would be more efficient (with API/DB calls only responsible for update the record/document).
We'd like to deploy to Vercel, but keep our CDN on AWS - and so AWS S3 pre-signed URLs are pretty high on our list of 'would be VERY nice to have'.
I get a similar issue when uploading videos of around 70 mb in the video collection using the cloudStorage icw Azure Blob Storage. Videos less than 15 mb are working fine.
[09:39:21] ERROR (payload): There was an error while uploading files corresponding to the collection videos with filename Roma_2024-(1080p).mp4:
[09:39:21] ERROR (payload): Cannot read properties of undefined (reading 'timeout')
err: {
"type": "TypeError",
"message": "Cannot read properties of undefined (reading 'timeout')",
"stack":
TypeError: Cannot read properties of undefined (reading 'timeout')
at Object.handleUpload (/node_modules/@payloadcms/plugin-cloud-storage/src/adapters/azure/handleUpload.ts:38:36)
at map (/node_modules/@payloadcms/plugin-cloud-storage/src/hooks/beforeChange.ts:48:25)
at Array.map (<anonymous>)
at hook (/node_modules/@payloadcms/plugin-cloud-storage/src/hooks/beforeChange.ts:47:32)
at /node_modules/payload/src/collections/operations/create.ts:185:16
at create (/node_modules/payload/src/collections/operations/create.ts:181:5)
at createHandler (/node_modules/payload/src/collections/requestHandlers/create.ts:26:17)
}
Link to reproduction
No response
Describe the Bug
I've successfully created, configured, and tested a
media
collection that has no problem uploading images, resizing them, and forwarding them to my S3 bucket using the@payload/plugin-cloud-storage
package. There's no issue with how the collection functions, unless I try to upload a video with a filesize of ~180MB.The
nova
function is just a thin sugar-wrapper on myfetch
request.The request hangs for about 3 minutes and then I see the following in the Payload console logs:
And the very helpful error message in the response from the request:
This is obviously some sort of MongoDB timeout, but I'm confused as to why a MongoDB timeout would have any impact on a file upload at all. Is there a MongoDB transaction that starts at the onset of the file upload, and then it doesn't finish until the file upload is done?
Nonetheless, I tried to increase the
upload.limits.filesize
value in mypayload.config.ts
file, but the issue persists.I'm looking to use Payload to handle uploads of up to 5GB. Is this possible with Payload?
To Reproduce
Create a media collection and attempt to upload a large file via the REST API interface.
Payload Version
2.8.1
Adapters and Plugins
No response