Parse.Object.saveAll can result in duplicate objects

simonaberry commented 2 years ago

New Issue Checklist

[x] I am not disclosing a vulnerability.
[x] I am not just asking a question.
[x] I have searched through existing issues.
[x] I can reproduce the issue with the latest versions of Parse Server and the Parse JS SDK.

Issue Description

this issue has been specifically created to reference in this PR

Parse.Object.saveAll uses the /batch endpoint

if you have a large batch of large objects that are being saved via saveAll, and the network quality is very slow, it can happen that the initial request times out, and will therefore be automatically retried. However, some of the objects in the first batch may have already been successfully saved - but then the second retry would result in duplicate objects being created.

Steps to reproduce

Watch this Loom

Actual Outcome

duplicate objects created

Expected Outcome

no duplicate objects

Environment

JS SDK used in web client, over spotty networks

Server

Parse Server version: any
Operating system: browser
Local or remote host (AWS, Azure, Google Cloud, Heroku, Digital Ocean, etc): remote

Database

System (MongoDB or Postgres): mongo
Database version: any
Local or remote host (MongoDB Atlas, mLab, AWS, Azure, Google Cloud, etc): remote

Client

Parse JS SDK version: latest

Logs

parse-github-assistant[bot] commented 2 years ago

Thanks for opening this issue!

🚀 You can help us to fix this issue faster by opening a pull request with a failing test. See our Contribution Guide for how to make a pull request, or read our New Contributor's Guide if this is your first time contributing.

mtrezza commented 2 years ago

Could you please fill out the template to describe the issue and indicate whether you can reproduce the issue, see checkbox at the top. If the issue does not exist in the latest version, we may close this issue and related PRs.

simonaberry commented 2 years ago

Here is a video explaining the issue https://www.loom.com/share/9744297bacfd41f3bd18e60527f0eacc

mtrezza commented 2 years ago

Thanks for updating the issue.

If the cause of the failure was a network delay, then the saveAll will automatically retry, with all 20 objects again.

This sounds like a bug, because duplicate objects would hardly be an intended outcome. So maybe we don't need a new Parse.Object.saveAllSettled method but fix the existing Parse.Object.saveAll method to only retry for failed objects?

I've classified this as a bug with severity:medium, as the workaround is to not use Parse.Object.saveAll but a custom batch saving mechanism.

simonaberry commented 2 years ago

@mtrezza I guess it is debatable if it is a bug or not... but the issue stems from the fact that the saveAll call is a batch call, so if there is an error, the entire batch gets resent. The PR we have proposed used individual object.save() calls as a workaround

mtrezza commented 2 years ago

I guess it is debatable if it is a bug or not

Can you think of a use case in which one desires this behavior leading to duplicate objects? My assumption would rather be the opposite, like you pointed out in your issue.

simonaberry commented 2 years ago

ok - point taken. It's a bug.

Just surprised no-one else has reported it in the last ~10 years!

mtrezza commented 2 years ago

Well, if we had an "archeological bug excavator" badge, you would be the first to get it 🙂 Do you have any suggestion how that bug could be fixed without introducing a breaking change?

simonaberry commented 2 years ago

I have had a very quick look at the batch saving and it think it would be possible to change that logic from using the 'batch' endpoint to using a series of individual saves (which would give the client the benefit of understanding which individual items were successfully saved or not).

But I guess the first step is an agreement (from you guys who have to see the big picture on the overall direction of changes) that in principal that is the correct approach... I imaging the reason for using /batch calls in the first place was to reduce the overhead associated with bulk transactions.... saving objects individually would undo that benefit.

simonaberry commented 2 years ago

you could argue that the same risk exists that if there is a network interruption during the individual save, you could equally end up with duplicate objects on the server. This is true. However, the 'damage' is reduced to just one (or two) duplicates, not 20 (default batch size?).

mtrezza commented 2 years ago

you could argue that the same risk exists that if there is a network interruption during the individual save, you could equally end up with duplicate objects on the server.

That is why the idempotency feature has been introduced.

However, there is a difference between what you originally described in your issue and your last comment:

a) If the server response fails, duplicate objects cannot be prevented, unless using an idempotency strategy.
b) If the server response fails, resending the whole batch of N objects can create up to N-1 duplicates. Even with the idempotency feature enabled, since the /batch endpoint receives a single network request wrapping multiple save requests, it will only have one UUID (if batch requests are even currently supported by the idempotency feature).

Some thoughts:

The advantage of using the /batch endpoint is that multiple requests are mapped into a single request, reducing the request frequency. This can be a significant cost factor for cloud services with pay-per-request, so a change would likely have to be considered a breaking change.
The current alternative to ParseObject.saveAll using /batch is to make individual ParseObject.save requests and handle each response individually. This reduces the risk that a single failing request leads to duplicates of more than 1 object. To further reduce the risk, activating the idempotency feature ensures that objects are not duplicated.
Since the issue is caused by the automatic retry mechanism build into the various Parse client SDKs, an alternative solution could be deactivating the retry mechanism (setting to 0 retries).

My suggestion is that we just add a caveat warning to the saveAll and REST batch endpoint that saving N objects can create up to N-1 duplicates in bad networks conditions if idempotency is not activated.

mtrezza commented 2 years ago

@simonaberry Kindly let me know if you agree with the suggestion above and whether we can regard this issue as merely a "docs" issue.

simonaberry commented 2 years ago

@mtrezza yes - I think your summary of the issue above is good and logical.

We have already resolved the problem ourselves by building our own error handling logic - but at the price of multiple requests.

So yes - happy that we just improve the documentation to make everyone aware of the potential downsides of saveAll

parse-community / docs