opensearch-project / opensearch-benchmark

OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch
https://opensearch.org/docs/latest/benchmark/
Apache License 2.0
112 stars 79 forks source link

[Bug]: Failing --number-of-docs parameter for create-workload action #658

Closed ngc4579 closed 1 month ago

ngc4579 commented 2 months ago

Describe the bug

When creating a workload from existing indices and limiting the number of documents using the --number-of-docs parameter of the create-workload action, the command fails with an exception:

$ opensearch-benchmark create-workload ... --indices=index1,index2 --number-of-docs="index-1:1000 index-2:1000"

2024-09-30 09:29:19,940 -not-actor-/PID:251 osbenchmark.workload_generator.workload_generator INFO Extracted index settings and mappings from [[Index(name='index-1', document_frequency=0, number_of_docs={'index-1': '1000', 'index-2': '1000'}, settings_and_mappings={}), Index(name='index-2', document_frequency=0, number_of_docs={'index-1': '1000', 'index-2': '1000'}, settings_and_mappings={})]]

2024-09-30 09:29:19,944 -not-actor-/PID:251 osbenchmark.benchmark ERROR A fatal error occurred while running subcommand [create-workload].
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/benchmark.py", line 940, in dispatch_sub_command
    workload_generator.create_workload(cfg)
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/workload_generator/workload_generator.py", line 73, in create_workload
    index_corpora = corpus_extractor.extract_documents(index.name, index.number_of_docs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/workload_generator/extractors.py", line 174, in extract_documents
    documents_to_extract = total_documents if not documents_limit else min(total_documents, documents_limit)
                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'dict' and 'int'

To reproduce

Try creating a workload from an existing index while limiting the number of documents using --number-of-docs.

Expected behavior

Workload should be created as specified without the command crashing.

Screenshots

If applicable, add screenshots to help explain your problem.

Host / Environment

K8s 1.29, OSB 1.9.1 running in Pod

Additional context

It seems in helpers.py, the function process_indices assigns the entire index / count dict to each Index element instead of extracting the specific document count.

Relevant log output

2024-09-30 09:29:19,940 -not-actor-/PID:251 osbenchmark.workload_generator.workload_generator INFO Extracted index settings and mappings from [[Index(name='index-1', document_frequency=0, number_of_docs={'index-1': '1000', 'index-2': '1000'}, settings_and_mappings={}), Index(name='index-2', document_frequency=0, number_of_docs={'index-1': '1000', 'index-2': '1000'}, settings_and_mappings={})]]
2024-09-30 09:29:19,941 -not-actor-/PID:251 py.warnings WARNING /usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'opensearch-nodes.opensearch.svc'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

2024-09-30 09:29:19,944 -not-actor-/PID:251 osbenchmark.benchmark ERROR A fatal error occurred while running subcommand [create-workload].
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/benchmark.py", line 940, in dispatch_sub_command
    workload_generator.create_workload(cfg)
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/workload_generator/workload_generator.py", line 73, in create_workload
    index_corpora = corpus_extractor.extract_documents(index.name, index.number_of_docs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/workload_generator/extractors.py", line 174, in extract_documents
    documents_to_extract = total_documents if not documents_limit else min(total_documents, documents_limit)
                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'dict' and 'int'
IanHoang commented 2 months ago

Was able to reproduce. Will put out a fix shortly.

IanHoang commented 2 months ago

Got a fix working for it. Addressing in a PR.

(.venv) hoangia@80a9971b1103 opensearch-benchmark % opensearch-benchmark create-workload --target-hosts=XXXXXX --client-options=basic_auth_user:'XXXXXX',basic_auth_password:'XXXXXX' --indices=movies-1000,movies-2000,nyc_taxis  --output-path=~/Desktop/ --workload=test-workload --number-of-docs="movies-2000:1500 nyc_taxis:1500"

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[INFO] Connected to OpenSearch cluster [69622e766ec7eb17f038aed664796847] version [2.5.0].

A workload already exists at /Users/hoangia/Desktop/test-workload. Would you like to remove it? (y/n): y
[INFO] Removing workload of the same name.
Extracting documents for index [movies-1000] for test mode... 1000/1000 docs [100.0% done]
Extracting documents for index [movies-1000]...               1000/1000 docs [100.0% done]
Extracting documents for index [movies-2000] for test mode... 1000/1000 docs [100.0% done]
Extracting documents for index [movies-2000]...               1500/1500 docs [100.0% done]
Extracting documents for index [nyc_taxis] for test mode...   1000/1000 docs [100.0% done]
Extracting documents for index [nyc_taxis]...                 1500/1500 docs [100.0% done]

[INFO] Workload test-workload has been created. Run it with: opensearch-benchmark --workload-path=/Users/hoangia/Desktop/test-workload

-------------------------------
[INFO] SUCCESS (took 4 seconds)
-------------------------------