jacobkap / fastDummies

The goal of fastDummies is to quickly create dummy variables (columns) and dummy rows.
https://jacobkap.github.io/fastDummies/
Other
36 stars 9 forks source link

Patch 1 #2

Closed yu45020 closed 6 years ago

yu45020 commented 6 years ago

Minor changes in "dummy_cols" function:

  1. new option for creating new copy for the input dataset
  2. setindex for the data.table before modifying dummy columns' values

Results: For datasets with small row numbers, say 1e3, the original function is faster. But for data with more than 1e6 rows, setting index key in the data.table greatly reduces the run time. data.table

1e3 rows trail two nrows 1e3

1e6 rows trail one nrows 1e6

jacobkap commented 6 years ago

Hi, Thanks for your contribution. It's a bit busy because of finals weeks and I've been planning on making some changes to this package anyways. So If I don't merge your contribution in a week or so please remind me.

Jacob

yu45020 commented 6 years ago

No need to rush. I write my function to create dummy columns and find your package. Please check the "chmatch" function in the data.table if you have time. It is a fast match function. It performs substantially faster for data with 1e6 rows and 5 non-numeric columns.

jacobkap commented 6 years ago

Hi, The package has been updated since you submitted the pull request. Can you please resubmit to fit with the current function? I don't want to add any new parameters, but I'd be happy to include your improvement with some conditional on when to use it. Jacob

yu45020 commented 6 years ago

Happy New Year.

Sure to help, though I am working on another project now.

jacobkap commented 6 years ago

Happy New Year.

Take as much time as you need. I'm just glad for any help.