Open ununhexium opened 5 months ago
This PR also triggered the issue: https://github.com/sovity/edc-extensions/pull/970
Faulty branch tracked as reference/edc-ce-issue-870
in EDC CE
100% reproducible error at
de.sovity.edc.ext.wrapper.api.ui.pages.catalog.CatalogApiTest#testDistributionKey
on reference/edc-ce-issue-870-repro
Similar error but probably not related: when failing got process a message sent over the EDC protocol:
2024-07-10 11:08:46 5.15.0 WARNING An exception mapping did not successfully produce and processed a response. Logging the exception propagated to the default exception mapper. java.lang.IllegalStateException: ServiceLocatorImpl(__HK2_Generated_5,5,236544568) has been shut down
I have the impression that adding tests triggers more of these timeouts, then adding 1 more @DisabledOnGithub
on the failing test hides the issue.
Another problem, unrelated: reference/gh-lombok-issue-missing-builder
Fails with
> Task :utils:test-utils:javadoc
/home/runner/work/edc-ce/edc-ce/utils/test-utils/src/main/java/de/sovity/edc/extension/e2e/extension/CeE2eTestExtensionConfigFactory.java:22: error: cannot find symbol
public static E2eTestExtensionConfig.E2eTestExtensionConfigBuilder defaultBuilder() {
^
symbol: class E2eTestExtensionConfigBuilder
location: class E2eTestExtensionConfig
/home/runner/work/edc-ce/edc-ce/utils/test-utils/src/main/java/de/sovity/edc/extension/e2e/extension/CeE2eTestExtensionConfigFactory.java:26: error: cannot find symbol
public static E2eTestExtensionConfig.E2eTestExtensionConfigBuilder withModule(String module) {
^
symbol: class E2eTestExtensionConfigBuilder
location: class E2eTestExtensionConfig
2 errors
on GH but run fine in IJ and locally.
Problem summary
This is a summary of the attempts to make PR#842 work.
No solution was found, no root cause was found, and the problem may happen again.
This is here to document what's been attempted and resume the investigation the next time this problem happens.
The problematic PR
https://github.com/sovity/edc-extensions/pull/842/
This PR has the following symptoms:
The investigation revealed that this PR fails when a specific combination of 3 factors are met. That is, when it:
client.uiApi().getCatalogPageDataOffers(TestUtils.PROTOCOL_ENDPOINT)
Minimum code to reproduce the error
After a few unfruitful attempts directly in the original pull request, this second PR was created to isolate and reproduce the problem.
https://github.com/sovity/edc-extensions/pull/864
Not all the tests were run on that PR, the ones where the test results are not mentioned have run in the original PR (where I later push --forced to clean up the history). Almost all the runs there failed. The ones that didn't fail are the ones that didn't have the problematic combination.
Here is a list of all fixing attempts:
Remove any of the 3 elements above and it's fine:
Note: the faulty code "worked" once.
385f538
The one time that it worked (after doing localhost -> 127.0.0.1)2e7bcf8
And then an empty commit to double check. But this second it showed the original error again.This problem is therefore very reliably reproducible, but only in a very specific set of circumstances.
On a different repo
https://github.com/sovity/edc-ce-copy/pull/3/
This is the same code as the original repo.
I added 2 commit for codestyle and info, just to make the
gradle build
part finish.It works fine. It can fail due to ports allocation but that's a different issue (happen at startup instead of during the request) and it has a very clear error message.
Minimum reproducible code on the copy repo
https://github.com/sovity/edc-ce-copy/pull/2
The build running fine on that case, without the
127.0.0.1
change. It fails later at deployment but that's a credentials issue.https://github.com/sovity/edc-ce-copy/actions/runs/8452890706/job/23154317197?pr=2#step:8:27687
Answers to the commit questions:
yes
running, when calling
getCatalogPageDataOffers
no
no
yes,
getCatalogPageDataOffers
no
Seems so. A test that runs with just this method crashes. A test that runs without it but everything else doesn't crash.
Doesn't complain about the invalid URL
No
No Throwable (includes java system
Error
s) is thrownI really don;t remember the result for this. Doesn't really matter.
Only happened when calling
getCatalogPageDataOffers
.This was to isolate the problem as much as possible.
Empty commits to try to make the PR fail in the copy repo. No problem, the code worked 3 times out of 3 attempts.
The problem in the original repo fails 95%+ of the time (just 1 "miracle" when the localhost -> 127.0.0.1 was changed).
No
Yes
Yes
The server correctly binds to the correct port and IP, checked with
ss
, doesn't help with the issue.The allocated system resources are plenty enough.
free -m
also shows enough memory available.no
Other questions:
No, setting a 30s timeout on the http client doesn't help
Doesn't help. A port allocation error would happen before the call can be made. Also the EDC shows that it got a port, and
ss
shows that it's correctly allocated, even without dynamic port allocation.Remote debugging, but in the public repo. Is there a way to do it safely?
Is there any stickiness in the running node?
More ideas to try
As the issue seems network-related, double-check the network calls:
curl
tcpdump
and check the callsMake the code parts of HK/Glassfish log more info.
Very similar problems where the root cause was only identified to be
Jersey
, and Jersey got replaced.https://github.com/openhab/openhab-distro/issues/587#issuecomment-350581593
https://github.com/openhab/openhab-distro/issues/587
Notes
Is a problem that happens in HK.
https://javaee.github.io/hk2/
Which is used by Jetty