telekom / nlu-bridge

MIT License
4 stars 5 forks source link

Configure stratification in train_test_split #9

Closed Ablesius closed 2 years ago

Ablesius commented 2 years ago

This commit makes a backwards-compatible change; the default behaviour is the same as before, but now you can use the stratification parameter of the method to pass anything you need to sklearn's train_test_split stratify parameter.

Ablesius commented 2 years ago

This makes the following possible:

import os
import pandas as pd
from nlubridge.vendors import TfidfIntentClassifier
from nlubridge import NLUdataset 

dataset = NLUdataset(texts, intents)
dataset = dataset.shuffle()
classifier = TfidfIntentClassifier()

train, test = dataset.train_test_split(test_size=0.25, random_state=0, stratification=None)    # stratification can be configured; if you don't set it, it uses the same setting as before!

classifier = classifier.train_intent(train)
predicted = classifier.test_intent(test)
res = pd.DataFrame(list(zip(test.intents, predicted)), columns=['true', 'predicted'])

I can disable stratification! And the code will still work.