intellistream / Sesame

[SIGMOD'23] Data Stream Clustering: An In-depth Empirical Study [ICDM'24] MOStream: A Modular and Self-Optimizing Data Stream Clustering Algorithm
MIT License
18 stars 6 forks source link

StreamKM++ throws exception while KDD-99 #151

Closed wzru closed 2 years ago

wzru commented 2 years ago
TEST(SystemTest, StreamKM) {
  // Setup Logs.
  setupLogging("benchmark.log", LOG_DEBUG);
  // [529, 999, 1270, 1624, 2001, 2435, 2648, 3000]
  // [3, 3, 4, 6, 6, 7, 9, 9]
  // Parse parameters.
  param_t cmd_params;
  cmd_params.num_points = 3000;
  cmd_params.seed = 1;
  cmd_params.num_clusters = 23;
  cmd_params.dim = 41;
  cmd_params.coreset_size = 500;
  cmd_params.time_decay = false;

  cmd_params.input_file = std::filesystem::current_path().generic_string() +
                          "/datasets/KDD-99.txt";

output:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from SystemTest
[ RUN      ] SystemTest.StreamKM
Default Input Data Directory: /home/shaun/Sesame/build/test/datasets/KDD-99.txt
Read from the file...
Complete reading from the file...
Finished loading input data
data number: 3000
Algorithm: StreamKMeans Seed: 1   ClusterNumber: 7   CoresetSize: 500
DataSource spawn thread=0
Engine spawn thread=1
DataSink spawn thread=2
DataSink start to grab data
Algorithm start to process data
DataSource start to emit data
Created manager with 5 windows of dim: 41
sourceEnd set to true
ready to process remaining data
ready to offline clustering
KMeans++ start!!!
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)
Aborted