Performed A/B testing, comparing Opensearch index data ingestion from Databricks using elasticsearch-spark-30_2.12-8.6.0.jar vs opensearch-spark-30_2.12-1.0.1.jar. The test using Opensearch Spark as the connector had timings that was 2-3 times more that of Elasticsearch Spark connector.
How can one reproduce the bug?
Test 1: Create 10 separate Opensearch index (same schema) with Parent/Child records. Run the insert or update operations into 10 indices in parallel from databricks using elasticsearch spark connector first and record the timings. Then use Opensearch spark connector and record the timings.
Test 2: Create one Opensearch index. Run insert/update operations from databricks using elasticsearch spark connector and notice the timings. Then use Opensearch spark connector and notice the timings.
What is the expected behavior?
The insert/update timings should match or be similar.
What is your host/environment?
Opensearch 2.11, Databricks 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).
Both jars below hosted in S3 buckets.
elasticsearch-spark-30_2.12-8.6.0.jar
opensearch-spark-30_2.12-1.0.1.jar
What is the bug?
Performed A/B testing, comparing Opensearch index data ingestion from Databricks using elasticsearch-spark-30_2.12-8.6.0.jar vs opensearch-spark-30_2.12-1.0.1.jar. The test using Opensearch Spark as the connector had timings that was 2-3 times more that of Elasticsearch Spark connector.
How can one reproduce the bug?
Test 1: Create 10 separate Opensearch index (same schema) with Parent/Child records. Run the insert or update operations into 10 indices in parallel from databricks using elasticsearch spark connector first and record the timings. Then use Opensearch spark connector and record the timings. Test 2: Create one Opensearch index. Run insert/update operations from databricks using elasticsearch spark connector and notice the timings. Then use Opensearch spark connector and notice the timings.
What is the expected behavior?
The insert/update timings should match or be similar.
What is your host/environment?
Opensearch 2.11, Databricks 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12). Both jars below hosted in S3 buckets. elasticsearch-spark-30_2.12-8.6.0.jar opensearch-spark-30_2.12-1.0.1.jar
Do you have any screenshots?
Yes Test Timings and configs.docx