Closed streamnsight closed 2 years ago
Thanks!
Profiles are best guesses... They're intended to create distinct segments that can be detected in the data. I have no idea if rural females between 25-50 are more likely to shop on Tuesday than Monday. :)
I believe your second point is accurate.
Part of working with fraud in a realistic environment is dealing with and training models with an unbalanced data set, which is why it's setup like that. The code should be easy to modify to support a variety of needs.
That being said, this is something I put together over 6 years ago for a grad school project, and is definitely unmaintained. I'm happy to review and approve pull requests though if you'd like to submit any!
Very interesting tool. Good job there.
I have many questions about this tool though:
It indeed generates a 'realistic' dataset, but it's very unbalanced. It might be useful to be able to define the rate of fraud so as to obtain a balanced dataset (rather than generate a huge set and later downsample 90%+ of it). having that option would be useful I think.
Thanks