opensearch-project / security

🔐 Secure your cluster with TLS, numerous authentication backends, data masking, audit logging as well as role-based access control on indices, documents, and fields
https://opensearch.org/docs/latest/security-plugin/index/
Apache License 2.0
180 stars 264 forks source link

SecurityBackwardsCompatibilityIT is flakey #4445

Closed nibix closed 2 weeks ago

nibix commented 2 weeks ago

The implementation of SecurityBackwardsCompatibilityIT.testDataIngestionAndSearchBackwardsCompatibility seems to be flakey.

See https://github.com/opensearch-project/security/actions/runs/9466606199/job/26084103426?pr=4432 for an example failure. An extract from the logs:

2024-06-11T15:15:47.0121673Z   2> org.opensearch.client.ResponseException: method [POST], host [https://127.0.0.1:43047], URI [_bulk?refresh=wait_for], status line [HTTP/1.1 400 Bad Request]
2024-06-11T15:15:47.0123964Z     {"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
2024-06-11T15:15:47.0125831Z         at __randomizedtesting.SeedInfo.seed([886B03124E611721:FAA93CAA4C940925]:0)
2024-06-11T15:15:47.0127142Z         at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:385)
2024-06-11T15:15:47.0128471Z         at app//org.opensearch.client.RestClient.performRequest(RestClient.java:355)
2024-06-11T15:15:47.0130007Z         at app//org.opensearch.client.RestClient.performRequest(RestClient.java:330)
2024-06-11T15:15:47.0131459Z         at app//org.opensearch.security.bwc.helper.RestHelper.makeRequest(RestHelper.java:63)
2024-06-11T15:15:47.0133208Z   1> [2024-06-11T07:15:45,889][INFO ][o.o.s.b.SecurityBackwardsCompatibilityIT] [testNodeStats] before test
2024-06-11T15:15:47.0135201Z   1> [2024-06-11T07:15:45,890][INFO ][o.o.s.b.h.RestHelper     ] [testNodeStats] Making request GET _nodes/stats, with headers null
2024-06-11T15:15:47.0137336Z   1> [2024-06-11T07:15:46,060][INFO ][o.o.s.b.h.RestHelper     ] [testNodeStats] Recieved response HTTP/1.1 200 OK
2024-06-11T15:15:47.0139080Z         at app//org.opensearch.security.bwc.helper.RestHelper.requestAgainstAllNodes(RestHelper.java:83)
2024-06-11T15:15:47.0140941Z         at app//org.opensearch.security.bwc.helper.RestHelper.requestAgainstAllNodes(RestHelper.java:70)
2024-06-11T15:15:47.0142898Z         at app//org.opensearch.security.bwc.SecurityBackwardsCompatibilityIT.ingestData(SecurityBackwardsCompatibilityIT.java:242)
2024-06-11T15:15:47.0171106Z         at app//org.opensearch.security.bwc.SecurityBackwardsCompatibilityIT.testDataIngestionAndSearchBackwardsCompatibility(SecurityBackwardsCompatibilityIT.java:196)

We see here that the test code tried to do a bulk index operation. However, OpenSearch rejected that request with a 400 Bad Request error with the detail reason "request body is required".

The code which creates the bulk index operations can be found at lines 227ff in SecurityBackwardsCompatibilityIT:

    private void ingestData(String index) throws IOException {
        StringBuilder bulkRequestBody = new StringBuilder();
        ObjectMapper objectMapper = new ObjectMapper();
        int numberOfRequests = Randomness.get().nextInt(10);
        while (numberOfRequests-- > 0) {
            for (int i = 0; i < Randomness.get().nextInt(100); i++) {
                Map<String, Map<String, String>> indexRequest = new HashMap<>();
                indexRequest.put("index", new HashMap<>() {
                    {
                        put("_index", index);
                    }
                });
                bulkRequestBody.append(objectMapper.writeValueAsString(indexRequest) + "\n");
                bulkRequestBody.append(objectMapper.writeValueAsString(Song.randomSong().asJson()) + "\n");
            }
            List<Response> responses = RestHelper.requestAgainstAllNodes(
                testUserRestClient,
                "POST",
                "_bulk?refresh=wait_for",
                RestHelper.toHttpEntity(bulkRequestBody.toString())
            );
            responses.forEach(r -> assertEquals(200, r.getStatusLine().getStatusCode()));
        }
    }

The issue is caused by the Randomness.get().nextInt(100) statement inside the for loop in line 232. This call returns numbers in the range from 0..99. If 0 is returned, the bulk body will be empty. Thus, we have a 1% chance for a test failure. A secondary issue is that the random number is regenerated for each loop iteration. Thus, the average number of documents in the bulk request body will not be 100 / 2, but much lower.