tnc-br / ddf-isoscapes

3 stars 0 forks source link

Changes due to fraudulent percentage parameter #169

Open gretaabib opened 1 year ago

gretaabib commented 1 year ago

simulated_fraud_percent inclusion

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

benwulfe commented 1 year ago

This side looks good, but I think the % fraud should be based on the total, not a % based on non-fraud. check out your other PR in ddf_common for comments

erickzul commented 1 year ago

@benwulfe Unless I'm missing something, the number of fraudulent examples with the way we create them can't exceed the volume of non-fraud. Please expand on this to make sure myself and @gretaabib understand and implement this.

benwulfe commented 1 year ago

I think number of fraud can certainly exceed the number of nonfraud. As we discussed in the offsite, if 80% of timber is fraudulent, this means out of 100 samples, 80 are fraud and 20 are not, meaning 4x of the nonfraud are fraud.