Closed HoffmannTom closed 2 months ago
Seems to be a hardware issue. sorry.
If somebody has a similar issue: Our issue was caused by the dm-crypt layer. The read/write-queues caused the blocking and needed to be disabled: https://unix.stackexchange.com/questions/724104/disable-read-write-workqueue-for-ubuntu-full-disk-encryption
Describe the bug
We have an import job which indexes around 200 000 documents. A java-client is using the bulk API. After 3 to 4 minutes (around 150 000 - 170 000 documents), the OpenSearch server freezes for 20 - 30 seconds and then continues normal operation.
My observatoins so far:
I checked the /proc/pid/fd entries, which stay almost constant around 1300. The syslog doesn't show any errors. The node error-log only shows the log entries mentioned above (Connection reset, timeout warnings)
Upgrading to 2.15 didn't solve the issue. The issue didn't show up with version 2.13. OS: Ubuntu 22 LTS. OpenJDK 64-Bit Server VM Temurin-21.0.3+9
Any hints about how to narrow down the issue are welcome.
Related component
Indexing
To Reproduce
Currently no sample project for reproducing
Expected behavior
No freezes during (bulk) import.
Additional Details
Plugins opensearch-alerting opensearch-anomaly-detection opensearch-asynchronous-search opensearch-cross-cluster-replication opensearch-custom-codecs opensearch-flow-framework opensearch-geospatial opensearch-index-management opensearch-job-scheduler opensearch-knn opensearch-ml opensearch-neural-search opensearch-notifications opensearch-notifications-core opensearch-observability opensearch-performance-analyzer opensearch-reports-scheduler opensearch-security opensearch-security-analytics opensearch-skills opensearch-sql
Host/Environment (please complete the following information):
Additional context Add any other context about the problem here. os-stack-1.txt os-stack-2.txt os-stack-3.txt