Closed Scherzan closed 1 year ago
Hi @Scherzan, nice to meet you and thanks for filing the issue with the detailed information.
I believe issue may be a dupe of #1154, as we do not support the pandas.Int64
type.
As long as you specify Int64
in the metadata, you should not need to manually convert the dataframe yourself. The SDV will know that your the values should be whole numbers represented by 64 bits.
Note that the computer_representation
parameter in the metadata, does not refer to pandas dtypes. It refers to how many bits are being used to store the data, to ensure that there are no overflow errors.
Hi @npatki, it's great to meet you too! Thank you for your friendly response. Hopefully next time I won't miss to check open issues thoroughly enough. Thank you for taking the time to respond so quickly and helpfull. Have a wonderful day!
Hi @Scherzan no problem at all! Always here to help or clarify any Qs you may have. Let us know if you run into any other issues.
FYI you can also join our Slack Community and post there if that's easier.
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
Fitting CTGANSynthesizer on data containing columns with numerical column type 'Int64' throws error in numerical_formatter.py on code roundable_data = data[~(np.isinf(data) | pd.isna(data))] (line 57). Error message gives TypeError (details below). I would expect support for data formatted with type 'Int64', as the documentation states support for 'Int64' and it is an integer-format that allows for NaN-values. Converting columns to data.astype('int64') throws IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer. Using data.astype('float') works fine.
Steps to reproduce
Run code below with sdv-beta installed.