yandex-research / tab-ddpm

[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"
https://arxiv.org/abs/2209.15421
MIT License
397 stars 89 forks source link

Reconstruct CSV with column names #3

Closed andrewnc closed 1 year ago

andrewnc commented 2 years ago

Great work here, I'm curious if there is a supported way to reconstruct the synthetic data into a single DataFrame / CSV with the original column names preserved?

rotot0 commented 2 years ago

Thanks a lot. Unfortunately, there is no such way.

You can find here how some of the original datasests (adult, california, fb-comments, gesture, higgs-small, house) were spliited/transformed into X_{num|cat}_{train|val|test}.npy format. But there is no script to transform it back to the original format.

I may add this in the future but it's unlikely.