Open filipefurtad0 opened 2 years ago
This is also coming up on this branch - not yet in master: https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/3489852498/jobs/5840503897
I wonder if this somehow relates to GH Actions?
EDIT: or perhaps https://github.com/openfoodfoundation/openfoodnetwork/pull/9986?
Each time a different spec.
Yes, #9986 seems relevant, with this change re. timeouts https://github.com/rubycdp/cuprite/pull/215
Hmm, so it doesn't seem to be related to the new Knapsack setup. Thanks for digging up the recent changes Sigmund, that seems to be related. I guess the next step then is to try downgrading Cuprite to see if that resolves the issue. If so, we could submit a bug to Cuprite, which hopefully can be resolved. If not, then we can try increasing the timeout.
As I understand from this discussion, what the cuprite bump introduces is an output to provide more information when the timeout error occurs. So downgrading it shuld only remove that output, and probably not fix the error.
The good news is that it is not introduced by Knapsack; the bad news is that even when after lowering the number of nodes it sometimes occurs, like here: https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/3647209594/jobs/6159168431
I've contacted Knapsack support for any advise and will continue to investigate. Also opened an issue on the Ferrum repo.
I've reproduced a similar error locally:
1)
As an admin
I want to set a supplier and distributor(s) for a product
as anonymous user is redirected to login page when attempting to access product listing
Failure/Error: expect { visit spree.admin_products_path }.not_to raise_error
expected no Exception, got #<Ferrum::TimeoutError: Ferrum::TimeoutError> with backtrace:
# ./spec/system/admin/products_spec.rb:25:in `block (4 levels) in <main>'
# ./spec/system/admin/products_spec.rb:25:in `block (3 levels) in <main>'
# ./spec/system/support/cuprite_setup.rb:41:in `block (2 levels) in <main>'
# -e:1:in `<main>'
This happened while running three terminal windows, and repeating the same example in parallel, using the ./script/rspec-slow-repeat
script.
One idea could be to split the system tests into two runner machines on Github Actions, one for /admin
and the other for /consumer
tests. I'll make a PR and see if it still occurs.
Also, maybe relevant:
We use Ubuntu 20.04, but it seems macOS seem better performant (+1 core CPU, +7 GB RAM), as indicated here:
Hardware specification for Windows and Linux virtual machines:
2-core CPU (x86_64)
7 GB of RAM
14 GB of SSD space
Hardware specification for macOS virtual machines:
3-core CPU (x86_64)
14 GB of RAM
14 GB of SSD space
I wonder if migrating the build to macOS would improve this?
Do I understand this table https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits correctly in that we would get only 5 concurrent jobs if migrating to macOS ?
Humm, seems to be that way indeed. In that case, I guess we're better off with the 60 concurrent jobs in Ubuntu :+1:
Let's keep an eye on this after merging #10127 and close if it doesn't reoccur.
I'm afraid this happened (a lot) again https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/3702598137 ðŸ˜
Thanks for reporting @sigmundpetersen - let's move it back to In Dev, in that case :+1:
Just to be sure, when you mean "a lot" @sigmundpetersen you mean happening 4 times on the same build run - like the example you've pointed out - is this correct?
Just to be sure, when you mean "a lot" @sigmundpetersen you mean happening 4 times on the same build run - like the example you've pointed out - is this correct?
Exactly
Haven't seen it much else lately on master
build though. So maybe just a one off?
Maybe the Github Action servers/nodes were very busy during that sepcific build?
We could just let the issue sit for a while and monitor the frequency. What do you think?
There's also the Ferrum::DeadBrowserError
happening once in a while:
6) Product Import when dealing with uploaded files handles cases where files contain malformed data
Got 0 failures and 3 other errors:
6.1) Failure/Error: let!(:enterprise) { create(:supplier_enterprise, owner: user, name: "User Enterprise") }
ActiveRecord::RecordInvalid:
Validation failed: Name has already been taken. If this is your enterprise and you would like to claim ownership, or if you would like to trade with this enterprise please contact the current manager of this profile at sharee.heidenreich@flatley.co.uk.
# <internal:kernel>:90:in `tap'
# ./spec/system/admin/product_import_spec.rb:14:in `block (2 levels) in <main>'
# ./spec/system/support/cuprite_setup.rb:41:in `block (2 levels) in <top (required)>'
6.2) Failure/Error: return super unless Capybara.last_used_session
Ferrum::DeadBrowserError:
Browser is dead or given window is closed
# <internal:kernel>:90:in `tap'
# ./spec/system/support/cuprite_helpers.rb:25:in `take_screenshot'
# ./spec/system/support/cuprite_setup.rb:41:in `block (2 levels) in <top (required)>'
6.3) Failure/Error: example.run
Ferrum::DeadBrowserError:
Browser is dead or given window is closed
# ./spec/system/support/cuprite_setup.rb:41:in `block (2 levels) in <top (required)>'
https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/3687395479/jobs/6240930191
Should we file an issue on it?
Haven't seen it much else lately on master build though. So maybe just a one off? Maybe the Github Action servers/nodes were very busy during that sepcific build?
Could be. I have not seen it happening much either.
We could just let the issue sit for a while and monitor the frequency. What do you think?
Agree, let's do that :+1: I'll move to tech debt prioritized instead.
Ferrum::DeadBrowserError Should we file an issue on it?
This has been reported at least on these two occasions here, here, also related here - and the consensus seems to be around a RAM issue, which I guess is external to us.
Although not introduced by Knapsack, I've reached out to them, and I've received advice on what could eventually improve the situation - maybe this is good to keep in mind:
I've noticed this one yesterday and today again: https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/3901037095/jobs/6662406732
What we should change and why (this is tech debt)
Context
https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/3487271563/jobs/5834704950
Impact and timeline