1) Choose a set of datasets, they should be diverse (different types of columns, different sizes) but preferably something users would use
2) Benchmark current DAI impl on those datasets
3) Switch DAI code to use our tSVD and rerun the benchmarks
Lets see what we get.
Please also have a look at memory footprint - if we notice that for certain datasets we're running OOM we will need a switch in our wrapper to use scikit instead.
Lets see if we can use tSVD internally!
1) Choose a set of datasets, they should be diverse (different types of columns, different sizes) but preferably something users would use 2) Benchmark current DAI impl on those datasets 3) Switch DAI code to use our tSVD and rerun the benchmarks
Lets see what we get.
Please also have a look at memory footprint - if we notice that for certain datasets we're running OOM we will need a switch in our wrapper to use scikit instead.