openfoodfoundation / openfoodnetwork

Connect suppliers, distributors and consumers to trade local produce.
https://www.openfoodnetwork.org
GNU Affero General Public License v3.0
1.11k stars 719 forks source link

Order Cycle not saving on UK production #2480

Closed Matt-Yorkley closed 6 years ago

Matt-Yorkley commented 6 years ago

We have an order cycle for an enterprise in the UK that fails to save, with the following message in the save bar:

screenshot from 2018-07-17 15-14-11

Description

[Update 22/7 by Lynne] This bug is affecting OCs that exist in a Schedule.

When the user replicates an OC that exists in a schedule, the new OC does not behave correctly - some fields (eg closing date) fail to save correctly with the above error. To get around this the user can remove the original OC from any schedule, then replicate the OC. Now the new OC will save correctly. [/Update]

The error message is triggered by the angular order cycles controller and seems to have been added fairly recently in refactoring that was related to Subscriptions.

The logs on the server show the data is being posted to an endpoint, but shows no errors beyond that, so there's no debugging info to go on... :scream:

We've just confirmed that this hub is the only one that has subscriptions enabled in their enterprise settings.

Expected Behavior

Copied Order Cycles can be saved without error.

Actual Behavior

Copied OC does not save.

Steps to Reproduce

[Update]

  1. Find an OC that exists in a schedule.
  2. Replicate the OC.
  3. Make changes to the replicated OC (in particular 'Closing Date')
  4. Save the changes to the replicated OC
  5. Error in your face [/Update]

Context

Saving an order cycle.

Severity

A workaround has been found and SO has not yet been released - so low. This is blocking SO.

Your Environment

Possible Fix

Before cloning an OC take the original OC out of any schedule. Then clone. Then the new OC works as normal.

Matt-Yorkley commented 6 years ago

Update: we've now confirmed turning off subscriptions fixes the bug... and seems like it was only affecting one enterprise.

lin-d-hop commented 6 years ago

Looks like StroudCo still can't update their order cycle. @OliverUK has reported this issue continues. @Matt-Yorkley not sure why it worked for you?

oeoeaio commented 6 years ago

I'm struggling to replicate this locally. @Matt-Yorkley could you provide more detailed steps to reproduce? What exactly are you trying to update that causes this error to occur?

screen shot 2018-07-19 at 11 49 14 am

kirstenalarsen commented 6 years ago

from my playing last week / earlier this week it seemed fine in aus prod . . but I wasn’t sure if maybe we’re running different versions

On 19 Jul 2018, at 11:54 AM, Rob H notifications@github.com wrote:

I'm struggling to replicate this locally. @Matt-Yorkley https://github.com/Matt-Yorkley could you provide more detailed steps to reproduce? What exactly are you trying to update that causes this error to occur?

https://user-images.githubusercontent.com/2067859/42916827-6ca130de-8b4a-11e8-90b7-09ee0041fecd.png — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openfoodfoundation/openfoodnetwork/issues/2480#issuecomment-406129248, or mute the thread https://github.com/notifications/unsubscribe-auth/ACxryNFY_UekFAzgeKHVI_V56pcS-MoPks5uH-bDgaJpZM4VS8V_.

OliverUK commented 6 years ago

@Matt-Yorkley @lin-d-hop @oeoeaio please see here: http://recordit.co/UtiphbBRem

lin-d-hop commented 6 years ago

@oeoeaio I've investigated this and added a lot more information as to the cause. Note there is a workaround (described in the bug desc now) but this is a blocker for SO release. Hopefully not a hairy one!

lin-d-hop commented 6 years ago

Ping @oeoeaio We've had this reported now by people that have never turned on SO now. The only people that have reported are people that have very large order cycles. So I wonder if this is related to additional resource required by SO simply being turned on, that manifests in an async issue between angular and rails.

From Sara @ Tamar: I can clone and create and change and update an OC for the future but not the one which is currently open. I have just closed the current OC, successfully made some changes and then re-opened it. This can be used as a work around for now!

oeoeaio commented 6 years ago

Thanks @lin-d-hop @Matt-Yorkley. I am still unable to reproduce this locally or on Aus production. Locally, I have been trying various combinations of things that have been demonstrated/suggested above using UK production data (StroudCo and Tamar) for about two hours...

If anyone thinks they can reproduce this consistently, it would be great to use chrome's dev tools to look at the response to the json update request. If we can get the status of the response, and any data that is returned that will give us a bit more of a clue as to what sort of error we are dealing with.

As @Matt-Yorkley pointed out, we're getting an error that says "Failed to create order cycle" (it should read 'update' instead of 'create'), which means that we are hitting this line. The only way we can hit that line (which is only meant to be a backstop in the case of an unknown error from the server) is if there is no errors property in the response data. Therefore, I suspect that there is some kind of error being encountered by this call to order_cycle_form.save that is not a ActiveRecord::RecordInvalid, meaning that this rescue block is not picking it up. This will probably result in a 500 error, but if you're not seeing anything in your logs then maybe I'm wrong...

lin-d-hop commented 6 years ago

@oeoeaio Would it be possible to try copying UK prod? Stroudco are reporting consistent errors. I have relabelled s2.

lin-d-hop commented 6 years ago

@oeoeaio JSON response is 500 error as you expect. This happened when I changed the Close Time of OC from 7:30am to 7:31am.

oeoeaio commented 6 years ago

@lin-d-hop I have been working with a copy of UK prod on my local machine. Is that what you mean? If you inspect the preview of the response, does it give you any errors, or any data at all? The 500 error is a start but it's not much to go on...

lin-d-hop commented 6 years ago

I suspect there is a race condition or memory issue and hence you aren't finding it locally. It is only happening on our largest OCs. No data in response.

lin-d-hop commented 6 years ago

screenshot from 2018-07-25 02-01-11

lin-d-hop commented 6 years ago

Maybe I need to up a response timeout somewhere...?

oeoeaio commented 6 years ago

What does the Preview tab say?

lin-d-hop commented 6 years ago

500 error message screenshot from 2018-07-25 07-59-17

oeoeaio commented 6 years ago

Gah, wow, we really have nothing to go on. I'll keep thinking...

Matt-Yorkley commented 6 years ago

If we update that rescue block to respond to more than just ActiveRecord::RecordInvalid, might that give us more to go on?

Matt-Yorkley commented 6 years ago

It's saving now, what's changed?

lin-d-hop commented 6 years ago

Nothing. Resource load?

Matt-Yorkley commented 6 years ago

I'm not sure about that though, there were OC's with more products than the troublesome one that were saving correctly.

OliverUK commented 6 years ago

I too can now save my OC even when it's already open.

daniellemoorhead commented 6 years ago

@lin-d-hop @Matt-Yorkley @OliverUK @oeoeaio what's the status with this? Is it still a problem to be resolved, even though it seems to have started working without anyone doing anything?

What's the plan stans?

OliverUK commented 6 years ago

@daniellemoorhead This week I had the error again intermittently. I was able to clone and save an OC with opening time that hadn't yet come. But a few minutes later I was unable to save a change (OC still not open). I set both start and end date into the past and tried again and then it saved. Luckily it seems no-one was shopping at the time. ;)

lin-d-hop commented 6 years ago

I wonder if updated Timeouts will have fixed this? Hasn't been reported for a while so closing.

daniellemoorhead commented 6 years ago

@lin-d-hop see @OliverUK's update above. Seems it's still happening this week...unless updated Timeouts happened post intermittent errors?

Matt-Yorkley commented 6 years ago

Timeouts were increased last night, so we can monitor this over the next week and see how it goes. We also added Bugsnag coverage yesterday, which should help with diagnosis.

daniellemoorhead commented 6 years ago

Reopening this and assigned to you @Matt-Yorkley, given your update - hopefully a solution (or at least culprit) shows itself with bugsnag...

daniellemoorhead commented 6 years ago

Hi @Matt-Yorkley, now that you've had a week of Bugsnag and the timeouts PR was merged have you found the culprit and/or seen an improvement?

Matt-Yorkley commented 6 years ago

We're testing some additional server timeout settings as of today, I'll update ofn-install with some new configurable settings if it fixes the issue..

daniellemoorhead commented 6 years ago

@Matt-Yorkley got an update on this?

daniellemoorhead commented 6 years ago

@Matt-Yorkley @lin-d-hop been almost 2 weeks since this bug has had an update.

Bueller? Bueller? Bueller?

OliverUK commented 6 years ago

@daniellemoorhead I understand it was caused by server timeout setting and is now resolved but will leave @lin-d-hop and @Matt-Yorkley to confirm.

daniellemoorhead commented 6 years ago

Thanks @OliverUK.

I'll close this for now, feel free to reopen if it isn't actually fixed.