Speeding the training on mixed data set - categorical data, numerical and text.

worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

MIT License

203 stars 23 forks source link

Hi @vinay-k12 , I think it's because of the cardinality of your data that makes the bootstrapping part not progress. What you could try is not to use the automated termination based on the bootstrap statistic; instead, you can use a validation sample.

Try:

# Use 20% of the data as a validation set early-stopping.
rtf_model = REaLTabFormer(
    model_type="tabular",
    gradient_accumulation_steps=4,
    logging_steps=100,
    train_size=0.8,
)

# Fit the model without sensitivity bootstrapping.
rtf_model.fit(df, n_critic=0)

This will fit the data directly. Hope this helps!

worldbank / REaLTabFormer

Speeding the training on mixed data set - categorical data, numerical and text. #26