opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

local spark ppl testing documentation #902

Closed YANG-DB closed 1 week ago

YANG-DB commented 1 week ago

Description

add local spark ppl testing documentation and details

Related Issues

896

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

LantaoJin commented 1 week ago

@YANG-DB A high level question: will we use local spark to do sanity test instead of using OpenSearch Domain in future? It seems not an end to end testing. For example we found this issue https://github.com/opensearch-project/opensearch-spark/issues/875 in sanity test with Domain env.

qianheng-aws commented 1 week ago

@YANG-DB What's the motivation to add this doc?

I think we already have the guide about the local spark ppl usage in root README: https://github.com/opensearch-project/opensearch-spark/blob/main/README.md#ppl-build--run

And the ppl commands testing is somehow duplicate with ppl-commands doc, that place should be the single of truth for each command https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/README.md

YANG-DB commented 1 week ago

@YANG-DB A high level question: will we use local spark to do sanity test instead of using OpenSearch Domain in future? It seems not an end to end testing. For example we found this issue #875 in sanity test with Domain env.

Hi @LantaoJin the idea behind this is to allow an open-source user to experiment with the PPL language in the development environment itself directly. It serves as a fast way to experiment with spark local cluster before moving it into more complicated use cases. The ultimate goal is to have a separate testing for open-source environment which is not depended on a specific provider. it doesnt function as a sanity test but rather as a user experiment tutorial for paying around with the language and understanding its capabilty.

YANG-DB commented 1 week ago

@YANG-DB What's the motivation to add this doc?

I think we already have the guide about the local spark ppl usage in root README: https://github.com/opensearch-project/opensearch-spark/blob/main/README.md#ppl-build--run

And the ppl commands testing is somehow duplicate with ppl-commands doc, that place should be the single of truth for each command https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/README.md

Hi @qianheng-aws - thanks for the feedback as I mentioned above this simple tutorial is a basic way for explaining how to quickly get started with PPL for a local spark cluster and is extending the README part. its supposed to be used by developer which are trying to understand whether this spark-opensource-ppl solution fits their need without the need to deploy a more complicated use case into a real spark cluster.