kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
63 stars 63 forks source link

CI GitHub action fails randomly (maybe also UI problem caused by HTML elements which are dynamically added and removed) #6138

Open stweil opened 3 months ago

stweil commented 3 months ago

Describe the bug The CI GitHub action does not pass reliably but fails sometimes with one or both of these errors:

Error:  Errors: 
Error:    CalendarST.createProcessFromCalendar » StaleElementReference stale element ref...
Error:    ImportingST.checkHierarchyImport:180 » StaleElementReference stale element ref...
[INFO] 
Error:  Tests run: 79, Failures: 0, Errors: 2, Skipped: 2

As long as the reason for the random failures are unknown, we must assume that such random failures can also occur in production environments.

To Reproduce Steps to reproduce the behavior:

  1. Run GitHub CI action several times until it fails.

Expected behavior The CI action should not fail randomly.

Release Git master, observed there since at least several weeks now, also in oldest CI tests which are still online (= less than 90 days old).

stweil commented 3 months ago

Here is a recent example: https://github.com/kitodo/kitodo-production/actions/runs/10005589215/job/27656601360#step:14:27222. See also the list of failing CI on master branch.

stweil commented 3 months ago

Issue #5378 describes a similar bug which was solved by ignoring the issue in the tests. This could be done here, too, but might not be the correct solution if the problem also exists in production. According to the fix #5380, the problem "is triggered when HTML elements are dynamically added and removed from the DOM (which happens often in Primefaces) such that the current reference to a specific HTML element is not valid any more". Will this happen only in tests?

stweil commented 2 months ago

Another error which also seems to occur randomly is this one:

[INFO] Running org.kitodo.selenium.CalendarST
[INFO ] 2024-07-29 13:00:02.840 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:02.840 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:02.868 [main] MetsService - Reading 2/meta.xml
[INFO ] 2024-07-29 13:00:02.893 [main] ProcessService - No metadata file for indexing: 3/meta.xml
[INFO ] 2024-07-29 13:00:02.894 [main] ProcessService - No metadata file for indexing: 3/meta.xml
[INFO ] 2024-07-29 13:00:02.992 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:02.992 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.026 [main] MetsService - Reading 2/meta.xml
[INFO ] 2024-07-29 13:00:03.063 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.064 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.089 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.090 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.244 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.244 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.273 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.273 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.303 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.304 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.334 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.334 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.364 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.365 [main] ProcessService - No metadata file for indexing: 1/meta.xml
[INFO ] 2024-07-29 13:00:03.413 [main] MetsService - Reading 2/meta.xml
[INFO ] 2024-07-29 13:00:03.460 [main] MetsService - Reading 2/meta.xml
[INFO ] 2024-07-29 13:00:03.502 [main] MetsService - Reading 2/meta.xml
Starting ChromeDriver 127.0.6533.72 (9755e24ca85aa18ffa16c743f660a3d914902775-refs/branch-heads/6533@{#1760}) on port 9813
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
Jul 29, 2024 1:00:04 PM org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
[INFO ] 2024-07-29 13:00:04.233 [main] ProcessService - No metadata file for indexing: 4/meta.xml
[INFO ] 2024-07-29 13:00:04.233 [main] ProcessService - No metadata file for indexing: 4/meta.xml
[INFO ] 2024-07-29 13:00:04.260 [main] MetsService - Reading 4/meta.xml
[INFO] [talledLocalContainer] Jul 29, 2024 1:00:06 PM javax.faces.validator.BeanValidator validate
Warning:  [talledLocalContainer] WARNING: cannot validate component with empty value: j_id__md_1
Error:  Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 16.142 s <<< FAILURE! - in org.kitodo.selenium.CalendarST
Error:  createProcessFromCalendar  Time elapsed: 13.637 s  <<< FAILURE!
java.lang.AssertionError: Number of issues in the calendar does not match expected:<4> but was:<3>
    at org.kitodo.selenium.CalendarST.createProcessFromCalendar(CalendarST.java:78)
stweil commented 1 month ago

@solth, I'd appreciate it if you could add a milestone to this issue, because it's really annoying when more than 50% of the CI tests fail because of this.

Does anyone have any idea how to fix this?

solth commented 3 weeks ago

Issue #5378 describes a similar bug which was solved by ignoring the issue in the tests. This could be done here, too, but might not be the correct solution if the problem also exists in production. According to the fix #5380, the problem "is triggered when HTML elements are dynamically added and removed from the DOM (which happens often in Primefaces) such that the current reference to a specific HTML element is not valid any more". Will this happen only in tests?

There are multiple reasons for the StaleElementReference that we encountered with the Selenium Tests in Kitodo.Production over the time. The DOM being adjusted dynamically by JSF or PrimeFaces is just one of them (albeit the most common one, I think). Other reasons can be that the browser in unable to reach a certain page because of an exception or error that occured in a previous test or during navigation on an earlier page. Sometimes the elements in a list, for example the options in a pulldown menu are unordered, even though the test expects them in a certain order, thus only succeeding when the list is coincidentally in the expected order.

In all these cases the browser ends up on a wrong page (sometimes the error page with stack trace), where certain expected elements were never present to begin with, thus causing the StaleElementReference (in contrast to the identical components being replaced by JSF). Unfortunately, it's very difficult to find out which scenario is the cause for the failing tests in most cases, because the logs and stack trace will often times only be the result of a previous error, not the cause of the original error itself.

Since currently these failing tests are causing more trouble than they are worth, I would probably set them to @disabled until enough resources are available to analyse the cause of the problem in depth. (as explained above, I think most of the reasons for the failing tests will only happen in tests, not in production, since they are related to certain "expectations" in the tests that are not met, so I think disabling these tests for now is acceptable as an exception this time)