huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections

http://www.uwazi.io

MIT License

242 stars 80 forks source link

GitHub workflows with "Run yarn blank-state" step exits intermittently with code 1 #6693

Closed mfacar closed 4 months ago

mfacar commented 7 months ago

After #6416, different workflows started to fail with code 1 in the step "Run yarn blank-state".

Output error:

Creating index uwazi_e2e...
 - Base properties mapping
2024-04-23T00:20:19.508Z [uwazi_e2e] Uncaught Reindex error.
request to http://localhost:32768/uwazi_e2e failed, reason: socket hang up
{
 "message": "request to http://localhost:32768/uwazi_e2e failed, reason: socket hang up",
 "type": "system",
 "errno": "ECONNRESET",
 "code": "ECONNRESET"
}
Will exit with (1)

Cypress e2e Jobs for reference: First occurrence Recent occurrence

RafaPolit commented 6 months ago

We have reviewed the mentioned PR and we are not 100% sure that the PR is responsible for the flakiness. Maybe the update of Node version (which happened at the same time) could be actually responsible for this? This is an ES issue and the PR adding the -force flag apparently is not affecting the ES flow.

mfacar commented 5 months ago

6901 although doesn't fix the root problem prevents stopping the workflow on failures at blank-state through the addition of a second attempt to complete the action

txau commented 5 months ago

@mfacar @Zasa-san @konzz further inspection reveals that the e2e flakiness is actually coming from that same step for loading the fixtures

CypressError:cy.exec('yarn blank-state --force')failed because the command exited with a non-zero code.

That is, the breforeAll that loads fixtures:

1) "before all" hook for "should have no detectable accessibility violations on load"

So all the problem is coming from the same source. On the bright side, our e2e seem to be robust.

txau commented 5 months ago

@mfacar I kept testing out things and it seems that this is happening due to slowness in elasticsearch in the CI environment. That socked hang up problems indicates either the client or the server closing the connection, probably the client due to a timeout while waiting for a server response.

How to increase the timeout seems to be quite a topic when using isomorphic-fetch (depending on node-fetch). Also, this library hasn't been updated in 4 years and it seems like other options are taking the space like cross-fetch, node-fetch or even undici (I also reported an issue related to node-fetch performance here: https://github.com/huridocs/uwazi/issues/4589).

Upgrading these libraries may fix the problem since they are internally handling timeouts in a different way. I think it may be worth it.

txau commented 5 months ago

After a quick discussion with @mfacar we should consider Axios as a potential replacement for node-fetch.