make vectorization generalizable

JunranY commented 2 years ago

https://github.com/joos2010kj/voyager-microbenchmarker/blob/d832516f18643e76c033f7ba518b8d34f7c8e809/src/targetExtractor.js#L16 We need to iterate through the data spec array. not only the first data entry.

joos2010kj commented 2 years ago

Let say, spec["data"][0]["name"] has: // { // "Filter": 2, // "Aggregate": 1, // "Window": 1, // "Filter_Card":0 // }

and spec["data"][1]["name"] has: // { // "Filter": 5, // "Swap": 2, // "Mac": 1, // "Swap_card":0 // }

Now, should the output be the merged form of these two objects, or should I return them all separately in an array? e.g., Option 1: return Object.assign(res1, res2) Option 2: return [res1, res2]

JunranY commented 2 years ago

merged, but instead of option 1 it's more like element-wise addition. eg. output would be // {"Filter": 7, // "Aggregate": 1, // "Window": 1, // "Filter_Card":0, // "Swap": 2, // "Mac": 1, // "Swap_card":0}

joos2010kj commented 2 years ago

Got it. Thanks.

joos2010kj commented 2 years ago

Do you have a test case with multiple samples in spec["data"][X]["name"], where X > 1

joos2010kj commented 2 years ago

I have pushed updates -- could you run a test with multi-instance samples? @JunranY

JunranY commented 2 years ago

{ "$schema": "https://vega.github.io/schema/vega/v5.json", "width": 400, "height": 200, "padding": 10, "data": [ { "name": "points", "transform": [ { "type": "dbtransform", "relation": "cars" } ] }, { "name": "ext", "source": "points", "transform": [ { "type": "extent", "field": "miles_per_gallon", "signal": "ext" } ] }, { "name": "binned", "source": "points", "transform": [ { "type": "bin", "field": "miles_per_gallon", "extent": { "signal": "ext" }, "maxbins": 10 }, { "type": "aggregate", "key": "bin0", "groupby": [ "bin0", "bin1" ], "fields": [ "bin0" ], "ops": [ "count" ], "as": [ "count" ] } ] } ], "scales": [ { "name": "xscale", "type": "linear", "range": "width", "domain": [ 0, 50 ] }, { "name": "yscale", "type": "linear", "range": "height", "round": true, "domain": { "data": "binned", "field": "count" }, "zero": true, "nice": true } ], "axes": [ { "orient": "bottom", "scale": "xscale", "zindex": 1 }, { "orient": "left", "scale": "yscale", "tickCount": 5, "zindex": 1 } ], "marks": [ { "type": "rect", "from": { "data": "binned" }, "encode": { "update": { "x": { "scale": "xscale", "field": "bin0" }, "x2": { "scale": "xscale", "field": "bin1", "offset": 0 }, "y": { "scale": "yscale", "field": "count" }, "y2": { "scale": "yscale", "value": 0 }, "fill": { "value": "steelblue" } }, "hover": { "fill": { "value": "firebrick" } } } } ] }

JunranY commented 2 years ago

also you need to use the vegaplus including parse and registering dbtransform for how to do this, refers to https://github.com/vega/vega-plus/tree/master/packages/vega-plus-core#vega-plus

JunranY commented 2 years ago

we also need to record the async run function execution time for the training labels

joos2010kj commented 2 years ago

^ Timing https://github.com/joos2010kj/voyager-microbenchmarker/blob/d832516f18643e76c033f7ba518b8d34f7c8e809/src/targetExtractor.js#L14

joos2010kj commented 2 years ago

^^https://github.com/joos2010kj/voyager-microbenchmarker/blob/d832516f18643e76c033f7ba518b8d34f7c8e809/src/targetExtractor.js#L13 & 14

joos2010kj / voyager-microbenchmarker

make vectorization generalizable #5