questions about the profiles and fraud rate

namebrandon / Sparkov_Data_Generation

Synthetic Credit Card Transaction Generator used in the Sparkov program.

MIT License

133 stars 62 forks source link

Very interesting tool. Good job there.

I have many questions about this tool though:

are the profiles values based on somewhat real statistics, or are they a merely a best guess approach?
If I understand the code correctly, transactions are generated, whether fraud is triggered is based on a 1% random chance, it is per customer, and then only within a randomly selected date range (and other transactions are therefore discarded, is that correct?)

It indeed generates a 'realistic' dataset, but it's very unbalanced. It might be useful to be able to define the rate of fraud so as to obtain a balanced dataset (rather than generate a huge set and later downsample 90%+ of it). having that option would be useful I think.

Thanks

namebrandon / Sparkov_Data_Generation

questions about the profiles and fraud rate #3