implydata / learn-druid

Learn the basics of Apache Druid® from leaders in the community with these notebooks and useful tools.
Apache License 2.0
33 stars 7 forks source link

[query/joins] Datagen jobs never completes #71

Open adithyachakilam opened 4 months ago

adithyachakilam commented 4 months ago

Steps to reproduce:

Even though, clickstream job has finished and we move on to generating the user data, curl localhost:9999/jobs shows that clicks job still has two active sessions and below is the output .

[
  {
    "name": "clicks",
    "config_file": "clickstream/clickstream.json",
    "target": {
      "type": "file",
      "path": "/files/clicks.json"
    },
    "active_sessions": 2,
    "total_records": 1725122,
    "start_time": "2024-02-13 05:15:01",
    "run_time": 172881.522483,
    "status": "COMPLETE",
    "status_msg": "Running, Sim Clock: 2024-02-15 05:16:20.150717"
  },
  {
    "name": "users",
    "config_file": "clickstream/users_init.json",
    "target": {
      "type": "file",
      "path": "/files/users.json"
    },
    "active_sessions": 1967,
    "total_records": 1967,
    "start_time": "2024-02-15 05:16:19",
    "run_time": 20.169898,
    "status": "RUNNING",
    "status_msg": "Starting generator job."
  }
]

I tried playing around the concurrency, # of events and hours of data but the issue still exists.

nick2432 commented 1 month ago

can i work on this?

petermarshallio commented 15 hours ago

Hey @nick2432! You are more than welcome to. When posting your PR, feel free to add me @petermarshallio or @hevansDev as reviewers.