Open gretaabib opened 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
This side looks good, but I think the % fraud should be based on the total, not a % based on non-fraud. check out your other PR in ddf_common for comments
@benwulfe Unless I'm missing something, the number of fraudulent examples with the way we create them can't exceed the volume of non-fraud. Please expand on this to make sure myself and @gretaabib understand and implement this.
I think number of fraud can certainly exceed the number of nonfraud. As we discussed in the offsite, if 80% of timber is fraudulent, this means out of 100 samples, 80 are fraud and 20 are not, meaning 4x of the nonfraud are fraud.
simulated_fraud_percent inclusion