sdcTools / recordSwapping

c++ code for rank-swapping
0 stars 1 forks source link

Household ID needs to be integer? #3

Open ppdewolf opened 4 years ago

ppdewolf commented 4 years ago

In my microdata the householdID is 12 digits (maximum). When I apply record swapping, I get negative housholdIDs back.

Looks like householdID needs to be integer and thus any number > MAXINT (=2147483647) is mapped to a negative number? The maximum householdID I find in the output is exactly 2147483647.

Can this be changed to allow for larger number of digits?

JohannesGuss commented 4 years ago

Yes it needs to be integer right now. As of right now you supply the whole data set as std::vector< std::vector<int> > and just tell the procedure where householdID, geographic variables,... are in the data set, thus they all need to integers.

We could go from std::vector< std::vector<int> > to std::vector< std::vector<double> > ...I think that should work without too much trouble, but I would have to check.

Or we change the way the inputs work and supply inputs seperately. So we dont supply as hid the position of the column in data but the column vector itself. If we would go down this road it would make sense change this for other parameters too. And we would need to specify how the output should look like, since for the implementation as of right now we input the data set and simply return it with changed rows.

What do think would be the best option, also regarding possible changes for your JAVA frontend? I think std::vector< std::vector<double> > might be the best option.

mescudero84 commented 4 years ago

I have a problem with the version 0.2.0 that I hadn't in 0.1.0 related to integer values. When I put

dat_swapped=recordSwap(data=dat,similar,hierarchy,

  • risk_variables,hid,k_anonymity,
  • swaprate,seed = 123456) I get: "Error in recordSwap(data = dat, similar = similar, hierarchy = hierarchy, : data must contain only integer values at this point - this condition might get droped in a future release" However, all my variables are integer. Thank you
JohannesGuss commented 4 years ago

@mescudero84 should be fixed now, had a bug when checking the inputs. Now the function checks if any column in data is non numeric or has any value containing a decimal part. If you reinstall version 0.2.0 this should work now

mescudero84 commented 4 years ago

Ok, thank you!