nicholasjhorton / textclassificationexamples

Text classification examples (Clickbait and Spam filtering)
1 stars 0 forks source link

consider whether the Nolan and Temple Lang spam filter data can be added #6

Open nicholasjhorton opened 4 years ago

nicholasjhorton commented 4 years ago

@phebepalmer might you be willing to take a look at this link and see whether a subset of the Nolan and Temple Lang data might be worth adding to the package?

https://spamassassin.apache.org/old/publiccorpus/

nicholasjhorton commented 4 years ago

See the new txt files in https://github.com/nicholasjhorton/textclassificationexamples/tree/master/data-raw

Note that there's some garbage numbers in the spam file:

Life Insurance Quotes Without the Hassle...            JHIWNS
Get out of debt quick!                                        4179uKlj5-057SXua1524UHkC5-900-28
RE
Market Internet Access - No Investment Needed 
Re
Cheap Fags
zzzz, Is Your web Site Making Money! 2
Market Internet Access - No Investment Needed 
re
8 Free Movie Tickets for doing a 2 Minute survey! Any Movie, Any Theater!
Fw
Save now                     
Fw
FORTUNE 500 COMPANY HIRING, AT HOME REPS.
$250,000 for only $6.50 per month.                    qkdjtqsscr

Might you be willing to edit this out? I think that it's some form of artifact.