opensearch-project / opensearch-benchmark-workloads

Official workloads used by OpenSearch Benchmark (OSB)
https://opensearch.org/docs/latest/benchmark/
11 stars 58 forks source link

[Big5] files.txt missing for big5 workload #304

Closed ayushav12 closed 1 month ago

ayushav12 commented 1 month ago

What is the bug?

files.txt is missing for the new introduced Big5 corpora. Hence, OSB fails to automatically download the corpora from the cloudfront URL.

How can one reproduce the bug?

run command sh download.sh big5 to see the failureas below

======== Downloading data corpus for workload big5
Switched to branch 'main'
Cloning into '/home/ec2-user/.benchmark/benchmarks/workloads/default'...
cat: /home/ec2-user/.benchmark/benchmarks/workloads/default/big5/files.txt: No such file or directory
Created data for big5 in benchmark-workload-data-big5.tar. Next steps:

1. Copy it to the user home directory on the target machine(s).
2. Extract with tar -xf benchmark-workload-data-big5.tar (will be extracted to ~/.benchmark/benchmarks).

What is the expected behavior?

Data corpora should be downloaded in the $HOME/.benchmark/benchmarks/data/big5 folder.

What is your host/environment?

AL2, ec2 instance

gkamat commented 1 month ago

Should be fixed with https://github.com/opensearch-project/opensearch-benchmark-workloads/pull/297.