Closed AayushSameerShah closed 1 year ago
So the python faker is basically generating rubbish. Here is a site you can use to check a host of identity fields including Aadhaar. Faker as you observed does not generate valid Aadhaar's, in particular it does generate them with a leading 1 and also Aadhar numbers need to have a valid check digit which the python faker does not honor.
If you are looking to generate synthetic data then you can use FTA for this purpose. If you have cloned the repository and have done a build and install then there is the full-blown CLI (see https://github.com/tsegall/fta#faker for more details on how to invoke the FTA faker). I used the following command to generate a 1000 'good' Aadhaar numbers.
cli --locale en-IN --faker "Aadhaar[type=IDENTITY.AADHAR_IN]" --records 1000
Note: The current code only allows for a space of no space, it does not allow for a hyphen to separate the components. I will update to handle the hyphen.
I have attached a sample file with decent numbers.
I was currently testing on Aadhaar detection with locale "en-IN". I have put multiple possible Aadhaar values in the dataset (given below) to try it with FTA.
As shown in this reference that aadhar number should be:
In the dataset all columns follow those rules of Aadhaar, and the columns are:
Surprisingly Python faker is creating the Aadhar Ids with "1" in the first digit. So I had to check both scenarios (with 1 and without 1). As a result none of them were detected as the
IDENTITY.AADHAR_IN
.Aadhaar Python Faker was the only one which was detected as a semantic type but that was detected as
CHECKDIGIT.LUHN
.The log: (I just named all headers the same "Aadhaar" to avoid any potential mis-detection)
Thank you, Tim!
I am using this dataset: Just Aadhaar SMALL100.csv