worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
https://worldbank.github.io/REaLTabFormer/
MIT License
203 stars 23 forks source link

AssertionError: The target length 10 of the data doesn't include the numeric precision at 20. Increase max_len to at least 22. #30

Closed vinay-k12 closed 1 year ago

vinay-k12 commented 1 year ago
image

How to resolve this?

avsolatorio commented 1 year ago

Hi @vinay-k12, you can try setting the parameter numeric_max_len=22. However, this will cause the model to have longer sequences. I suggest that you find the column in the data that contains very small or very large values, then rescale them. For example, if one column has values in the range of 0.00000000002 and 0.00000000008, you could consider multiplying this column by 10,000,000,000 so the model will only be fitted to data between 2 and 8. This will shorten the sequences and also the training time. After the model is trained, you can divide the rescaled column into the generated data by 10,000,000,000 so that you get the original range of the data.

avsolatorio commented 1 year ago

@vinay-k12 , I am closing this as inactive. Kindly reopen if my suggestion above does not work, or if you are still facing issues. Thanks!