opendp / smartnoise-samples

Code samples and documentation for SmartNoise differential privacy tools
MIT License
131 stars 49 forks source link

Unrecognized atomic type: <class 'numpy.object_'> #100

Closed SachinKonan closed 3 years ago

SachinKonan commented 3 years ago

I was trying to run the below sequence from the basic PUMA analysis notebook, where df.columns are the names of the the columns in the dataset.

# set sample size
n = 1_000

# set ranges/feasible values
age_range = (0., 100.)
sex_vals = [0, 1]
educ_vals = [i for i in range(1, 17)]
race_vals = [i for i in range(1, 7)]
income_range = (0., 500_000.)
married_vals = [0, 1]
with sn.Analysis() as analysis:
    # load data
    data = sn.Dataset(path = 'data.csv', column_names = df.columns)

    ''' get mean age '''
    # establish data 
    age_dt = sn.to_float(data['age'])

    # clamp data to range and impute missing values
    age_dt = sn.clamp(data = age_dt, lower = age_range[0], upper = age_range[1])
    age_dt = sn.impute(data = age_dt, distribution = 'Gaussian',
                                      lower = age_range[0], upper = age_range[1],
                                      shift = 45., scale = 10.)

    # ensure data are consistent with proposed n
    age_dt = sn.resize(data = age_dt, number_rows = n, distribution = 'Gaussian',
                       lower = age_range[0], upper = age_range[1],
                       shift = 45., scale = 10.)

    # calculate differentially private mean of age
    age_mean = sn.dp_mean(data = age_dt, privacy_usage={'epsilon': .65})

    ''' get variance of age '''
    # calculate differentially private variance of age
    age_var = sn.dp_variance(data = age_dt, privacy_usage={'epsilon': .35})

analysis.release()

# print differentially private estimates of mean and variance of age
print(age_mean.value)
print(age_var.value)

I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-a22b0df0dc74> in <module>
     35     age_var = sn.dp_variance(data = age_dt, privacy_usage={'epsilon': .35})
     36 
---> 37 analysis.release()
     38 
     39 # print differentially private estimates of mean and variance of age

/python3.7/site-packages/opendp/smartnoise/core/base.py in release(self)
    799         response_proto: api_pb2.ResponseRelease.Success = core_library.compute_release(
    800             serialize_analysis(self),
--> 801             serialize_release(self.release_values),
    802             self.stack_traces,
    803             serialize_filter_level(self.filter_level))

/python3.7/site-packages/opendp/smartnoise/core/value.py in serialize_release(release_values)
    105         values={
    106             component_id: serialize_release_node(release_node)
--> 107             for component_id, release_node in release_values.items()
    108             if release_node['value'] is not None
    109         })

/python3.7/site-packages/opendp/smartnoise/core/value.py in <dictcomp>(.0)
    106             component_id: serialize_release_node(release_node)
    107             for component_id, release_node in release_values.items()
--> 108             if release_node['value'] is not None
    109         })
    110 

/python3.7/site-packages/opendp/smartnoise/core/value.py in serialize_release_node(release_node)
    114         value=serialize_value(
    115             release_node['value'],
--> 116             release_node.get("value_format")),
    117         privacy_usages=release_node.get("privacy_usages"),
    118         public=release_node['public'])

/python3.7/site-packages/opendp/smartnoise/core/value.py in serialize_value(value, value_format)
    210         array=value_pb2.Array(
    211             shape=list(array.shape),
--> 212             flattened=serialize_array1d(array.flatten())
    213         ))
    214 

/python3.7/site-packages/opendp/smartnoise/core/value.py in serialize_array1d(array)
    142 
    143 def serialize_array1d(array):
--> 144     data_type = detect_atomic_type(array)
    145 
    146     container_type = {

/python3.7/site-packages/opendp/smartnoise/core/value.py in detect_atomic_type(array)
    137         atomic_type = "string"
    138     else:
--> 139         raise ValueError(f"Unrecognized atomic type: {array.dtype.type}")
    140     return atomic_type
    141 

ValueError: Unrecognized atomic type: <class 'numpy.object_'>

Does anyone know how I can fix this error? I have the most recent version opendp smart noise core sdk

ecowan commented 3 years ago

Hi @SachinKonan, when you call df.columns it returns an Index, which is a numpy object. The argument column_names needs to be a list - when I wrapped this call in a list your code worked!

data = sn.Dataset(path = data_path, column_names = list(df.columns))

SachinKonan commented 3 years ago

ah ic, thanks!