krisy / kaggle

kaggle
1 stars 1 forks source link

Changing from shuffle() to something faster #1

Open krisy opened 11 years ago

krisy commented 11 years ago

When selecting random rows from the dataset, the data is shuffled, then the first k rows are selected. This means O(n*logn) complexity. Instead we should you e.g. the following:

TODO: can numpy generate somethink like that?