Add weights to classification

R470R commented 1 year ago

Hello, I am having a few problems adding weights to classification problem of imbalanced dataset, already tried in weights put ex "[0.0182, 0.9818]" or "dls.train.cws"

It doesn't process, with Assertion error or other kind ...

Can you help me?

oguiza commented 1 year ago

Hi @R470R, Could you provide a code snippet and the full traceback? FYI, Pytorch requires weights to be passed as a tensor, in the same device as the model.

R470R commented 1 year ago

Yes @oguiza !

dls.train.cws
TensorCategory([0.0182, 0.9818], device='cuda:0')

from tsai.all import *
tfms  = [None, [Categorize()]]
dsets = TSDatasets(three_d_trended_X, three_d_trended_y, tfms=tfms, splits=splits, inplace=True)
dls   = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[64, 128], batch_tfms=[TSStandardize()], num_workers=0)

batch_tfms = [TSStandardize(by_sample=True)]
learn = TSClassifier(three_d_trended_X, three_d_trended_y, splits=splits, 
                     weights = dls.train.cws, batch_tfms=batch_tfms, metrics=accuracy, 
                     arch=InceptionTimePlus, arch_config=dict(fc_dropout=.5), train_metrics=True)
learn.fit_one_cycle(10)

AssertionError Traceback (most recent call last) Input In [9], in <cell line: 7>() 4 dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[64, 128], batch_tfms=[TSStandardize()], num_workers=0) 6 batch_tfms = [TSStandardize(by_sample=True)] ----> 7 learn = TSClassifier(three_d_trended_X, three_d_trended_y, splits=splits, 8 weights = dls.train.cws, batch_tfms=batch_tfms, metrics=accuracy, 9 arch=InceptionTimePlus, arch_config=dict(fc_dropout=.5), train_metrics=True) 10 learn.fit_one_cycle(10)

File ~/anaconda3/envs/rapids-22.02/lib/python3.9/site-packages/tsai/tslearner.py:38, in TSClassifier.init(self, X, y, splits, tfms, inplace, sel_vars, sel_steps, weights, partial_n, train_metrics, bs, batch_size, batch_tfms, shuffle_train, drop_last, num_workers, do_setup, device, arch, arch_config, pretrained, weights_path, exclude_head, cut, init, loss_func, opt_func, lr, metrics, cbs, wd, wd_bn_bias, train_bn, moms, path, model_dir, splitter, verbose) 35 bs = batch_size 37 # DataLoaders ---> 38 dls = get_ts_dls(X, y=y, splits=splits, sel_vars=sel_vars, sel_steps=sel_steps, tfms=tfms, inplace=inplace, 39 path=path, bs=bs, batch_tfms=batch_tfms, num_workers=num_workers, weights=weights, partial_n=partial_n, 40 device=device, shuffle_train=shuffle_train, drop_last=drop_last) 42 if loss_func is None: 43 if hasattr(dls, 'loss_func'): loss_func = dls.loss_func

File ~/anaconda3/envs/rapids-22.02/lib/python3.9/site-packages/tsai/data/core.py:986, in get_ts_dls(X, y, splits, sel_vars, sel_steps, tfms, inplace, path, bs, batch_tfms, num_workers, device, shuffle_train, drop_last, weights, partial_n, sampler, sort, kwargs) 984 dsets = TSDatasets(X, y, splits=splits, sel_vars=sel_vars, sel_steps=sel_steps, tfms=tfms, inplace=inplace) 985 if weights is not None: --> 986 assert len(X) == len(weights) 987 if splits is not None: weights = [weights[split] if i == 0 else None for i,split in enumerate(splits)] # weights only applied to train set 988 dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, path=path, bs=bs, batch_tfms=batch_tfms, num_workers=num_workers, 989 device=device, shuffle_train=shuffle_train, drop_last=drop_last, weights=weights, 990 partial_n=partial_n, sampler=sampler, sort=sort, kwargs)

AssertionError:

oguiza commented 1 year ago

@R470R , There's misunderstanding here. There are 2 types of weights you can use with tsai. Sample weights or class weights.

Class weights (dls.cws) can be passed directly to the 'weight' argument in some Pytorch loss functions (like nn.CrossEntropyLoss for example). They just contain a weight for each loss. This is normally used when you want to balance classes using a weight per class.
Sample weights can be passed to the weights argument when building a Learner object in tsai. These need to be a weight for each specific sample. This is useful when you have for example some easy or hard examples you want to give different weight to, for regression tasks, etc.

In your case you are passing the class weights (dls.cws) as sample weights. This is causing the error.

timeseriesAI / tsai

Add weights to classification #613