worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
https://worldbank.github.io/REaLTabFormer/
MIT License
203 stars 23 forks source link

Can we treat the method as one of data augmentation? #5

Closed froggyis closed 1 year ago

froggyis commented 1 year ago

First, This is a interestion method and thanks for sharing the code.

Just a question about paper's detail. We all know that data augmentation for tabular regression is hard to implement.

I am wondering if I use this method as data augmentation and compare to SMOGN or others method that augment tabular regression data. Will it be appropriate why or why not?

I am not sure whether this is the right place to talk about the paper, if not I will delete the issue.

Thanks.

avsolatorio commented 1 year ago

Hello! Thanks for considering using REaLTabFormer!

I think REaLTabFormer may be helpful in data augmentation applications as well. One of the works (not public) where we use REaLTabFormer shows that it can generate out-of-data sample observations that could be useful for model generalization. SMOTE/SMOGN may not have this diversity see (https://arxiv.org/pdf/2209.15421.pdf).

However, as in any machine learning problem, cross-validation must be used to judge which parameters and components help improve the performance.

Thanks!

froggyis commented 1 year ago

Thanks for the quick reply, if the method works on my project, that will be great.