Open nuldertien opened 1 year ago
Thanks for filing @nuldertien -- we'll keep this issue open for tracking purposes and communicating progress.
For anyone seeing this issue for the first time, here is a suggestion in the meantime:
The SDV is smart enough to recognize that all values in the column are whole numbers. So even if you leave the column as float64
for now, any decimals you see should always end in .0
. While this not ideal in terms of data representation, it should hopefully still give you usable synthetic data.
I just got this same error so I'd like to point out this is an ongoing issue
Problem Description
I have a column in my dataset that has integers and nan values. The way I transform my columns currently, in order to deal with integers (no decimals) and nan values, is by transforming it to a 'Int64' dtype, more specifically; pd.Int64Dtype(). However after training a sdv model with this dtype it provides errors when I want to sample (
"Cannot interpret 'Int64Dtype()' as a data type"
).Expected behavior
Be able to support pandas dtypes such that I am able to train and sample on this kind of data.
Additional context
I transformed the column with]}, where the type() of each corresponds to [np.int64, np.int64, pandas._libs.missing.NAType]. The used metadata is provided below.
.astype('Int64')
, more specifically withround(pd.to_numeric(dataframe['column1'], errors='coerce')).astype('Int64')
. Such that: {'column1':[123500,56832,"fields": { "column1": { "type": "numerical", "subtype": "integer" }