quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
https://quickwit.io
Other
8.21k stars 336 forks source link

`quickwit tool local-ingest` broken #5494

Open fulmicoton opened 2 weeks ago

fulmicoton commented 2 weeks ago

As reported by "opendata"... We need to test quickwit tool local-ingest or deprecate it.

❯ Ingesting documents locally...

---------------------------------------------------
 Connectivity checklist
 ✔ metastore storage
 ✔ metastore
 ✔ index storage
 ✔ _ingest-cli-source

2024-10-16T22:51:55.538Z ERROR quickwit_actors::spawn_builder: actor-failure cause=early eof exit_status=Failure(early eof)
2024-10-16T22:51:55.538Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=SourceActor-autumn-tuMQ exit_status=Failure(early eof)
2024-10-16T22:51:56.537Z ERROR quickwit_actors::actor_handle: actor-exit-without-success actor="SourceActor-autumn-tuMQ"
Seeing a lot of problems with local-ingest since 8.x
Every 2nd or 3rd import also getting this:

Indexed 13,572,489 documents in 1m 4s.
*** ERROR tantivy::directory::directory: Failed to remove the lock file. FileDoesNotExist(".tantivy-writer.lock")
*** ERROR quickwit_indexing::actors::merge_scheduler_service: merge scheduler service is dead
*** ERROR quickwit_actors::spawn_builder: actor-failure cause=An IO error occurred: 'No such file or directory (os error 2)' exit_status=Failure(An IO error occurred: 'No such file or directory (os error 2)')
*** ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=MergeExecutor-dawn-dsdr exit_status=Failure(An IO error occurred: 'No such file or directory (os error 2)')
guilload commented 2 weeks ago

I vote for deprecating in 0.9 and removing two releases later.

PSeitz commented 1 week ago

The command is nice to profile indexing performance

trinity-1686a commented 1 week ago

it seems like this happens when a merge is running while the ingestion finishes. This sounds a lot like an issue we had with lambda deployments. Which brings the question, what should we do: