When selecting random rows from the dataset, the data is shuffled, then the first k rows are selected.
This means O(n*logn) complexity.
Instead we should you e.g. the following:
create a set
while the size of the set is < k
generate random number at most n
store in the set (this way if the same number is generated two times, the size of the set doesn't grow!)
we ha random indeces now, select the corresponding rows from the set
When selecting random rows from the dataset, the data is shuffled, then the first k rows are selected. This means O(n*logn) complexity. Instead we should you e.g. the following:
TODO: can numpy generate somethink like that?