Open nicholasjhorton opened 4 years ago
See the new txt files in https://github.com/nicholasjhorton/textclassificationexamples/tree/master/data-raw
Note that there's some garbage numbers in the spam file:
Life Insurance Quotes Without the Hassle... JHIWNS
Get out of debt quick! 4179uKlj5-057SXua1524UHkC5-900-28
RE
Market Internet Access - No Investment Needed
Re
Cheap Fags
zzzz, Is Your web Site Making Money! 2
Market Internet Access - No Investment Needed
re
8 Free Movie Tickets for doing a 2 Minute survey! Any Movie, Any Theater!
Fw
Save now
Fw
FORTUNE 500 COMPANY HIRING, AT HOME REPS.
$250,000 for only $6.50 per month. qkdjtqsscr
Might you be willing to edit this out? I think that it's some form of artifact.
@phebepalmer might you be willing to take a look at this link and see whether a subset of the Nolan and Temple Lang data might be worth adding to the package?
https://spamassassin.apache.org/old/publiccorpus/