`quickwit tool local-ingest` broken

fulmicoton commented 2 weeks ago

As reported by "opendata"... We need to test quickwit tool local-ingest or deprecate it.

❯ Ingesting documents locally...

---------------------------------------------------
 Connectivity checklist
 ✔ metastore storage
 ✔ metastore
 ✔ index storage
 ✔ _ingest-cli-source

2024-10-16T22:51:55.538Z ERROR quickwit_actors::spawn_builder: actor-failure cause=early eof exit_status=Failure(early eof)
2024-10-16T22:51:55.538Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=SourceActor-autumn-tuMQ exit_status=Failure(early eof)
2024-10-16T22:51:56.537Z ERROR quickwit_actors::actor_handle: actor-exit-without-success actor="SourceActor-autumn-tuMQ"
Seeing a lot of problems with local-ingest since 8.x
Every 2nd or 3rd import also getting this:

Indexed 13,572,489 documents in 1m 4s.
*** ERROR tantivy::directory::directory: Failed to remove the lock file. FileDoesNotExist(".tantivy-writer.lock")
*** ERROR quickwit_indexing::actors::merge_scheduler_service: merge scheduler service is dead
*** ERROR quickwit_actors::spawn_builder: actor-failure cause=An IO error occurred: 'No such file or directory (os error 2)' exit_status=Failure(An IO error occurred: 'No such file or directory (os error 2)')
*** ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=MergeExecutor-dawn-dsdr exit_status=Failure(An IO error occurred: 'No such file or directory (os error 2)')

guilload commented 2 weeks ago

I vote for deprecating in 0.9 and removing two releases later.

PSeitz commented 1 week ago

The command is nice to profile indexing performance

trinity-1686a commented 1 week ago

it seems like this happens when a merge is running while the ingestion finishes. This sounds a lot like an issue we had with lambda deployments. Which brings the question, what should we do:

if we want to deprecate, maybe do nothing? this isn't a problematic error in that all data is here, just not as merged as it could be
we could wait for any merges to finish, that seems like a good idea, but also means occasionally local ingest is going to be slow for a small file if it happens to be a time we want to merge. If we keep that command, i think this is the way to go
do not merge, that sounds like a fairly bad idea given the local ingest can generate a few splits for a single file getting ingested, so it could have a fairly bad performance impact at search. I don't think that's a good idea.

quickwit-oss / quickwit

`quickwit tool local-ingest` broken #5494