rhythmagency / formulate

An advanced form builder for Umbraco.
MIT License
92 stars 50 forks source link

Submission false positives recorded during load testing? #154

Closed TFAstudio closed 4 years ago

TFAstudio commented 4 years ago

We recently used Jmeter to test the limits of a form on some website hosting infrastructure. The page response times were excellent and errors were minuscule. Hosting seems terrific for the traffic volumes we anticipate.

However, we're currently investigating why some submissions did not land in the database. Jmeter recorded Success:True for all attempted form submissions but our resulting csv download revealed missing records. Tests were conducted from 500 submissions in 5 minutes to 8000 in the same time frame.

Is it likely the environment Jmeter was run in, or the test configuration scripts, were somehow able to generate false positives? Perhaps the error reporting was not catching failures as it should?

Or could the validation or submit-to-success round trip be misfiring under load? We've not seen this kind of thing before and would welcome any theories.

Nicholas-Westby commented 4 years ago

That's a good question. I'll throw in a few thoughts and perhaps you can look into which ones seem more likely.

One thing I remember is that Umbraco doesn't guarantee the database stuff will work from a background thread. It could be that, under high load, that lack of guarantee is especially relevant. IIRC, Formulate submits to the database from a background thread.

It could also be that there are errors happening (e.g., with database locks or something) that are happening on that background thread; maybe they aren't being logged properly, so something to look into.

Note that Formulate will return a success=true result pretty much all the time. This happens nearly immediately, as most form submission handlers do the majority of their work on a background thread.

TFAstudio commented 4 years ago

Thanks for your feedback.

After a week of testing every other thing we thought might be contributing to this issue, we're back to square one. In one test today we lost 3 of the first 10 records in a test of 750 submissions spread over 5 minutes (including record #1) and then many more after those. In similar tests on Friday we lost none. So server congestion may be playing a part. But in this situation the submission should fail on the front-end - not present a success message.

Naturally we're disappointed Formulate is not using a system to ensure the record is written to the database before displaying a success message. Our trust in the platform has taken a real hit.

Right now we're trying to figure out if we can write our own save workflow before the success message (in the Submitting of FormSubmitEvent).

Any advice you might have that could assist us (and prevent us from losing submissions) would be greatly appreciated.

Cheers.

Nicholas-Westby commented 4 years ago

Would you be willing to submit a pull request?

If the issue is what I imagine it to be, I imagine what you could do is change the "Store Data" handler so that it makes a web request from the background thread, then in the new request would do the database work (which would no longer be a background thread, but would also not slow down the form submission request).

You may also want to try and verify first that this is what's happening. Perhaps the background thread is throwing an error that isn't being logged, so adding a bit of logging first may help with testing.

Here is the current handler code for reference: https://github.com/rhythmagency/formulate/blob/master/src/formulate.app/Forms/Handlers/StoreData/StoreDataHandler.cs#L225-L244

Here is where the background thread is created (looks like it does have a bit of logging, FYI): https://github.com/rhythmagency/formulate/blob/master/src/formulate.api/Submissions.cs#L249-L265

TFAstudio commented 4 years ago

Well, it looks like what is happening is duplicate GUIDs are being inserted resulting in the following

ClientConnectionId:15250555-32ad-4aff-a127-882fe1d7e133
 2019-10-22 14:03:17,106 [P7796/D7/T34] ERROR formulate.api.Submissions+Submissions_Instance - An error occurred while executing one of the Formulate form handlers.
System.Data.SqlClient.SqlException (0x80131904): Cannot insert duplicate key row in object 'dbo.FormulateSubmission' with unique index 'IX_FormulateSubmission_GeneratedId'. The duplicate key value is (75056181-d37a-4978-a7d3-392630c4826d).
The statement has been terminated.
...
   at Umbraco.Core.Persistence.Database.Insert(Object poco)
   at formulate.app.Forms.Handlers.StoreData.StoreDataHandler.HandleForm(FormSubmissionContext context, Object configuration)
   at formulate.app.Forms.FormHandler`1.HandleForm(FormSubmissionContext context)
   at formulate.api.Submissions.<>c__DisplayClass10_0.<SubmitForm>b__1()

I have left off the parts that are not relevant of stack trace for the moment.

I thought what was happening was a race condition. Keep in mind, we are hitting this form pretty hard, so if two submissions race to the insert and the Umbraco.Core.Persistence.Database object provides the same GUID for two inserts, one will fail.

I thought putting a thread lock around the database insert statement in the StoreDataHandler.HandleForm() method, but that does not seem to stop the issue. I then used db.BeginTransaction() and db.CompleteTransaction() around the insert statement, again, to no effect.

Then I realized that Submissions is a static class that gets called from SubmissionsController. This led me to investigate the notion that the Guid.GetGuid() method generating the Submission.GeneratedId field might be overwriting an existing object reference before the database had a chance to submit the information the first time. So it is not a racing condition to the database, but rather the thread allowing the object to get changed before it is used and submitting the changed object more than once to the database.

I notice that the Submissions static class holds an IFormPersistence for the Forms, and that the FormHandlers are also cached. When the Submissions.SubmitForm() method gets called, it pulls out the same objects and calls handler.PrepareHandleForm(submissionContext) on them. This could be where the conflict is occurring.

To test this, I took off the thread generation and it submitted to the database without any errors like the one above.

To mitigate this issue, while still allowing to spawn a new thread for FormHandlers, I thought perhaps there should be a new instance of the enabled forms when Submissions.SubmitForm() gets called. This will ensure that each time a submissionContext gets passed it is a unique FormHandler object to then PrepareHandleForm() then HandleForm() while still allowing any errors with PrepareHandleForm() to pass back a Success = false.

This does not quite get rid of the issue of the spawned thread passing back any errors, but at least it deals with the Form getting hammered by our stress tests. To deal with the asynchronous issues, AJAX would have to deal with a promise and three states rather than success or failure:

I am not quite familiar with your javascript side of things to handle any of this at the moment, without a bit of guidance where to look.

I am interested in your thoughts about creating instances for active FormHandlers.

Thanks

TFAstudio commented 4 years ago

I noticed in your Change Releases:

Since version 2.4.5, these changes have been made:

Store Data Handler Improvement Like the email handler, the store data handler now prepares the data on the main request thread (but stores it on a background thread). This allows for more parts of you code to access the current HTTP request.

This seems to be exactly the issue. If I put the handler.PrepareHandleForm(submissionContext) in the same foreach loop with the handler.HandleForm(submissionContext) inside the thread or out, we do not get the duplicate GUID issue.

I am looking into using async and Tasks to let the Controller return more quickly. However, I still feel like having an async call to the web method and giving people the opportunity to have a pending, then success/failure states when it completes would be ideal.

Nicholas-Westby commented 4 years ago

Somebody is looking into the error relating to the GUID. The plan is to handle that as part of this ticket. Thanks for the pointers; that should make the fix happen much faster.

Regarding changing how Formulate works, please create a new GitHub issue for that so we can track the issues separately. My thought on that one is that we'd want to have two API calls (the first to send the data, the second to complete once the background tasks have completed). Then we can emit a new event ("formulate: submission verified" or something like that) on top of the existing event(s). That way, you can choose which event you want to respond to.

Nicholas-Westby commented 4 years ago

@TFAstudio FYI, we've validated that your analysis is correct and we have a fix we've spoken about. Just need to find some time to implement it; that should happen this week, I'm guessing. The fix is relatively simple, so it shouldn't take long.

TFAstudio commented 4 years ago

Excellent. It would be really good to get the duplicate GUID fix by the end of the week, we have another scheduled load testing run at the beginning of next week.

I will open a new issue regarding the threading aspects soon.

Thanks again

Nicholas-Westby commented 4 years ago

FYI, I have a fix for Umbraco 7 (not yet released): https://github.com/rhythmagency/formulate/commit/273b149ff6d3fc5848c1f53783f59340c267ee77

I you are able to test that fix before I release it, that would be super.

I'll see if I can work on a similar fix for Umbraco 8 (supposing it's an issue with both versions).

Nicholas-Westby commented 4 years ago

I did a quick review of the Umbraco 8 code and it seems likely that it will not be impacted by this issue, so no fix should be required.

I'm thinking that because the FormHandlerTypeCollectionBuilder has a transient lifetime, which I interpret to mean that the form handler instances will be created fresh on each web request: https://github.com/rhythmagency/formulate/blob/3004d968b4cc67f9663f6567c0babe9a08a7d41d/src/formulate.app/CollectionBuilders/FormHandlerTypeCollectionBuilder.cs#L11

And if each request has their own instance, then there should be no issue of data being shared across threads.

Nicholas-Westby commented 4 years ago

Actually, that's just for the instantiation of the IFormHandlerType instances. I'm not sure about the IFormHandler instances. Caching may still cause issues. Will need to review Umbraco 8 a bit more.

Nicholas-Westby commented 4 years ago

Created an Umbraco 8 fix too: https://github.com/rhythmagency/formulate/commit/9e9405e893ccd87321d2d409536e71fddcfcf806

Nicholas-Westby commented 4 years ago

FYI, Formulate 2.4.8 has the fix included: https://github.com/rhythmagency/formulate/releases/tag/v2.4.8

Would be great if @TFAstudio could test it to be sure it's fixed.

TFAstudio commented 4 years ago

Great news! We'll test ASAP (probably tomorrow, as our lead dev is out sick today).

TFAstudio commented 4 years ago

I built off the tagged 2.4.8 version for Umbraco 7 and deployed to our testing server. Our load testing has shown that there are no more duplicate IDs, so that fix seems to have worked! We have not tested the Umbraco 8 version.

Thanks for your quick response.

Nicholas-Westby commented 4 years ago

Great, thanks for the detailed report and follow up. FYI, I'm going to hold off on releasing the Umbraco 8 version for a bit since there seems to be less urgency behind it (though the fix is already in the repo). Closing this ticket now.

TFAstudio commented 4 years ago

We've done serious load testing of the solution today and the results are very promising. We did have an issue with one test where we recorded some "success" states on the front end, when records were not created in the database. The logs showed the SMTP server was refusing some requests (too busy).

So here's our questions

Nicholas-Westby commented 4 years ago

IIRC, the order is important and a failure in one handler will prevent the subsequent handlers from running (which is why I tend to make the "Store Data" handler the first one).