tdunning / log-synth

Generates more or less realistic log data for testing simple aggregation queries.
Apache License 2.0
257 stars 89 forks source link

Generate dependent data #32

Open deeptiantony opened 7 years ago

deeptiantony commented 7 years ago

Hi there, I am trying to generate events data, like below. There are two events : started and ended. One condition to consider while generating data is that timestamp of start event should be less than end event. How to configure that using log-synth?

{
    "actor": {
        "name": "student"
    },
    "task": {
        "id": "assignment"
    },
    "status": {
        "id": "ended"
    },
    "context": {
        "extensions": {
            "user-id": "11111",
            "username": "XYZ",
            "currentassignmentid": "2",
            "institutionname": "ABC"
        }
    },
    "timestamp": "2016-06-20 03:00:00",
    "id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "version": "1.0.1"
}
{
    "actor": {
        "name": "student"
    },
    "task": {
        "id": "assignment"
    },
    "status": {
        "id": "started"
    },
    "context": {
        "extensions": {
            "user-id": "11111",
            "username": "XYZ",
            "currentassignmentid": "2",
            "institutionname": "ABC"
        }
    },
    "timestamp": "2016-06-20 01:00:00",
    "id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "version": "1.0.1"
}
tdunning commented 7 years ago

I generally hack these situations and generate a start and a duration. Then I add them to get an end time using Apache Drill or Python (depending on scale)

It would also be easy to build an interval sampler that accepts start time parameters just like an event sampler plus a distribution for the interval (like what the random walk does). Should default to exponential time distribution.

On Thu, Feb 23, 2017 at 6:50 AM, deeptiantony notifications@github.com wrote:

Hi there, I am trying to generate events data, like below. There are two events : started and ended. One condition to consider while generating data is that timestamp of start event should be less than end event. How to configure that using log-synth?

{ "actor": { "name": "student" }, "task": { "id": "assignment" }, "status": { "id": "ended" }, "context": { "extensions": { "user-id": "11111", "username": "XYZ", "currentassignmentid": "2", "institutionname": "ABC" } }, "timestamp": "2016-06-20 03:00:00", "id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", "version": "1.0.1" }

{ "actor": { "name": "student" }, "task": { "id": "assignment" }, "status": { "id": "started" }, "context": { "extensions": { "user-id": "11111", "username": "XYZ", "currentassignmentid": "2", "institutionname": "ABC" } }, "timestamp": "2016-06-20 01:00:00", "id": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", "version": "1.0.1" }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdunning/log-synth/issues/32, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPSev8uTj4O_WvauzQFaLPzJGFaYnIPks5rfR4ngaJpZM4MJkrG .