nubank / fklearn

fklearn: Functional Machine Learning
Apache License 2.0
1.51k stars 165 forks source link

Replaced default argument min_df from 20 to 0 on TFIDF #213

Closed raphaeldayan-nubank closed 2 years ago

raphaeldayan-nubank commented 2 years ago

On sklearn min_df defaults to 0, it should default to 0 as well in fklearn. This argument is not specified in the docstring and was messing up with the performance of my sentiment classifier. I lost 20 points of recall because of this argument and had to spent hours figuring out that this was the problem. Could we change that argument to be consistent with sklearn?

hellenlima commented 2 years ago

I see two options here:

  1. Keep the default from scikit-learn (as @raphaeldayan-nubank suggested)
  2. Make the documentation clear about this difference

I prefer the first option, because it is more intuitive.