parse-community / docs

Parse Platform docs
https://docs.parseplatform.org
Other
313 stars 518 forks source link

Parse.Object.saveAll can result in duplicate objects #859

Open simonaberry opened 2 years ago

simonaberry commented 2 years ago

New Issue Checklist

Issue Description

this issue has been specifically created to reference in this PR

Parse.Object.saveAll uses the /batch endpoint

if you have a large batch of large objects that are being saved via saveAll, and the network quality is very slow, it can happen that the initial request times out, and will therefore be automatically retried. However, some of the objects in the first batch may have already been successfully saved - but then the second retry would result in duplicate objects being created.

Steps to reproduce

Watch this Loom

Actual Outcome

duplicate objects created

Expected Outcome

no duplicate objects

Environment

JS SDK used in web client, over spotty networks

Server

Database

Client

Logs

parse-github-assistant[bot] commented 2 years ago

Thanks for opening this issue!

mtrezza commented 2 years ago

Could you please fill out the template to describe the issue and indicate whether you can reproduce the issue, see checkbox at the top. If the issue does not exist in the latest version, we may close this issue and related PRs.

simonaberry commented 2 years ago

Here is a video explaining the issue https://www.loom.com/share/9744297bacfd41f3bd18e60527f0eacc

mtrezza commented 2 years ago

Thanks for updating the issue.

If the cause of the failure was a network delay, then the saveAll will automatically retry, with all 20 objects again.

This sounds like a bug, because duplicate objects would hardly be an intended outcome. So maybe we don't need a new Parse.Object.saveAllSettled method but fix the existing Parse.Object.saveAll method to only retry for failed objects?

I've classified this as a bug with severity:medium, as the workaround is to not use Parse.Object.saveAll but a custom batch saving mechanism.

simonaberry commented 2 years ago

@mtrezza I guess it is debatable if it is a bug or not... but the issue stems from the fact that the saveAll call is a batch call, so if there is an error, the entire batch gets resent. The PR we have proposed used individual object.save() calls as a workaround

mtrezza commented 2 years ago

I guess it is debatable if it is a bug or not

Can you think of a use case in which one desires this behavior leading to duplicate objects? My assumption would rather be the opposite, like you pointed out in your issue.

simonaberry commented 2 years ago

ok - point taken. It's a bug.

Just surprised no-one else has reported it in the last ~10 years!

mtrezza commented 2 years ago

Well, if we had an "archeological bug excavator" badge, you would be the first to get it 🙂 Do you have any suggestion how that bug could be fixed without introducing a breaking change?

simonaberry commented 2 years ago

I have had a very quick look at the batch saving and it think it would be possible to change that logic from using the 'batch' endpoint to using a series of individual saves (which would give the client the benefit of understanding which individual items were successfully saved or not).

But I guess the first step is an agreement (from you guys who have to see the big picture on the overall direction of changes) that in principal that is the correct approach... I imaging the reason for using /batch calls in the first place was to reduce the overhead associated with bulk transactions.... saving objects individually would undo that benefit.

simonaberry commented 2 years ago

you could argue that the same risk exists that if there is a network interruption during the individual save, you could equally end up with duplicate objects on the server. This is true. However, the 'damage' is reduced to just one (or two) duplicates, not 20 (default batch size?).

mtrezza commented 2 years ago

you could argue that the same risk exists that if there is a network interruption during the individual save, you could equally end up with duplicate objects on the server.

That is why the idempotency feature has been introduced.

However, there is a difference between what you originally described in your issue and your last comment:

Some thoughts:

My suggestion is that we just add a caveat warning to the saveAll and REST batch endpoint that saving N objects can create up to N-1 duplicates in bad networks conditions if idempotency is not activated.

mtrezza commented 2 years ago

@simonaberry Kindly let me know if you agree with the suggestion above and whether we can regard this issue as merely a "docs" issue.

simonaberry commented 2 years ago

@mtrezza yes - I think your summary of the issue above is good and logical.

We have already resolved the problem ourselves by building our own error handling logic - but at the price of multiple requests.

So yes - happy that we just improve the documentation to make everyone aware of the potential downsides of saveAll