terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

disambiguate origin of operation in teraslice assets #3837

Open godber opened 6 days ago

godber commented 6 days ago

Here's what we're thinking about for this interface now, still open for discussion:

notional job

{
    "name": "Reindex Events",
    "lifecycle": "once",
    "analytics": false,
    "assets": [
       "elasticsearch", 
       "asset1:v1.4.5",
       "asset1:v2.5.6",
       "asset2"
    ],
    "apis": [
        { "_name":  "example-api" },
        { "_name":  "example-api@asset2"  },
        { "_name": "example-api@asset1:v2.5.6"  },
        { "_name":  "example-api@asset1:v2.5.6:foo2"  },
    ],
    "operations": [
        {
            "_op": "elasticsearch_reader",
            "index": "events-*",
            "type": "event",
            "size": 5000,
            "date_field_name": "created"
        },
        {
            "_op": "custom_op@asset1"
            "some": "configuration1",
            "api_name":  "example-api@asset1:v2.5.6:foo2" 
        },
        {
            "_op": "custom_op@asset2",
            "some": "configuration2"
        },
        {
            "_op": "elasticsearch_bulk",
             "index": "bigdata3",
            "type": "events",
            "size": 5000
        }
    ]
}

We need to keep in mind the two scenarios:

When the execution is create from the job we will map "_op": "custom_op@asset1:version" to "_op": "custom_op@assetHash".