quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
https://quickwit.io
Other
6.99k stars 291 forks source link

Lambda does not exit on errors, keeps running until timeout #5009

Open OperationalFallacy opened 1 month ago

OperationalFallacy commented 1 month ago

Describe the bug

I haven't tested it super thoroughly, but I saw Lambda runs till max timeout even thought it couldn't process anything. One example was sending gz files which I realized not supported

Steps to reproduce (if applicable) Steps to reproduce the behavior:

  1. deploy indexer lambda
  2. send compressed file, test.json.gz

Expected behavior

It should exit immediately and not run until timeout

Configuration:

export class RustFunction extends Function {
  constructor(scope: Construct, id: string, props?: Partial<FunctionProps>) {
    const lambdaAssetPath = path.join(
      __dirname,
      "../../quickwit/quickwit",
      "quickwit-lambda/deploy",
      id
    );
    console.log("lambdaAssetPath", lambdaAssetPath);
    super(scope, id+'_f', {
      ...props,
      code: Code.fromAsset(lambdaAssetPath),
      handler: id, // use id to specify either "indexer" or "searcher"
      runtime: Runtime.PROVIDED_AL2,
      architecture: Architecture.ARM_64,
      logRetention: RetentionDays.ONE_DAY,
      tracing: Tracing.DISABLED,
    });
  }
}

Binaries compiled with this command (I didn't use cross it was too slow on Mac)

LIBZ_SYS_STATIC=1 TARGET_CC=aarch64-linux-musl-gcc RUSTFLAGS="-C linker=aarch64-linux-musl-gcc -C link-arg=-static -C opt-level=z -C lto" cargo build --release --target aarch64-unknown-linux-musl

  1. Output of quickwit --version

checked out v0.8.1

  1. The index_config.yaml

version: 0.7

index_id: test-index

doc_mapping: field_mappings:

search_settings: default_search_fields: [name]

indexing_settings: split_num_docs_target: 2000000

rdettai commented 1 month ago

Hi @OperationalFallacy, thanks for reporting this.

It would be a useful improvement indeed, even though it is far from trivial to identify all un-recoverable errors and bubble them up to stop the lambda.

One example was sending gz files which I realized not supported

gz files are supported, if you run Quickwit Lambda with these packages it should work just fine.

In general, we don't guaranty that main or the Quickwit release tags (e.g 0.8.1) builds to a functioning AWS Lambda. The release cycle of Quickwit Lambda is still independent at this stage. You should use the Lambda release tags instead (aws-lambda-beta-xx).

OperationalFallacy commented 1 month ago

Agree, won't be easy. I saw it runs server-side software. Pretty remarkable you made it work in Lambda, and so well.

Is there an option to make all errors unrecoverable? If users prefer Lambda can use own retry mechanism. I recognize, though - this is probably not how quickwit servers designed :)

Thanks for pointing to the tags, I'll try it!

rdettai commented 1 month ago

Is there an option to make all errors unrecoverable?

There is currently a retry mechanism at the indexing pipeline level that could be disabled. That wouldn't be a silver bullet solution but might already avoid hanging in many failure cases.

OperationalFallacy commented 1 month ago

That should help. Is there a configuration option to disable it?