opensearch-project / opensearch-benchmark-workloads

Official workloads used by OpenSearch Benchmark (OSB)
https://opensearch.org/docs/latest/benchmark/
19 stars 68 forks source link

[FEATURE] Add ZSTD Compressed Corpora of NYC Taxis, HTTP Logs, and Big5 Workloads #357

Open IanHoang opened 3 months ago

IanHoang commented 3 months ago

Is your feature request related to a problem?

A while back, @beaioun added support for ZSTD compression and decompression in OSB https://github.com/opensearch-project/opensearch-benchmark/issues/385. He suggested that we should create compressed file versions of larger corpora such as NYC Taxis, Http Logs, and Big5.

What solution would you like?

Create a compressed version of each of these workloads by doing something along the following:

  1. Use a virtual machine and run OSB against a cluster for workloads such as NYC_Taxis, Http_logs, and Big5.
  2. Compress the files for each workload in ~/.benchmark/benchmarks/data
  3. Add them to a cloud storage that's shareable (such as S3)
  4. Share with maintainers
OVI3D0 commented 1 month ago

I can take this one on🖐️