Closed ryanbender2 closed 7 months ago
One thing that could improve this PR is to allow arbitrary dollar amounts in the "scam" lines. How did you come up with the weights for each number? Are they arbitrary or from some kind of statistical analysis?
One thing that could improve this PR is to allow arbitrary dollar amounts in the "scam" lines. How did you come up with the weights for each number? Are they arbitrary or from some kind of statistical analysis?
Good point. I just copied the weights from Kit. haha I'm at work lol can't put that much time into this.
Yea. I think for a serious solution we need more data on these invoices (not only what the invoices look like but also the frequency of which invoice type is sent). Applying some kind of analysis to yield some insight on how these are constructed. I'm kind of lazy and on break but I might just try to create a simple PoC with tensorflow
Hey Kit!
Not too much was really added here, mostly just structure. However, my idea with this is that there is a threshold for susness. When an invoice is ranked, it's measured up to a threshold, and if it passes it, it's flagged. You can check out the output of this run in output/results.json.