surakshashukla / Phishing-email-detection

This is a phishing email detection analysis, which was done using Python and Apache Spark to detect phishing emails using three different classification method. Python was used for email data loading and text parsing using NLTK package. Apache Spark was used to run the analysis in big data environment.
7 stars 1 forks source link

Email DataSource #1

Open Ameya-Pandya opened 6 years ago

Ameya-Pandya commented 6 years ago

Hi,

Could you please tell from where did you get the email-data for analysis and is there any format of email do you require before parsing the data.

Thanks

surakshashukla commented 6 years ago

Hi,

I used data from enron email corpus. You can find it at www.enron-mail.com/email/http://www.enron-mail.com/email/. The format I considered was also based on the data here. Hope it helps.

Thanks, Suraksha

Sent from my iPhone

On Nov 29, 2017, at 8:20 AM, Ameya-Pandya notifications@github.com<mailto:notifications@github.com> wrote:

Hi,

Could you please tell from where did you get the email-data for analysis and is there any format of email do you require before parsing the data.

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/surakshashukla/Phishing-email-detection/issues/1, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKwrNo7gro9O42alfeV_sfemGp-Ld7EWks5s7VoEgaJpZM4Qu806.

runchranda commented 5 years ago

Hi, when I tried to run this program I got the error below:

Traceback (most recent call last): File "email_classification.py", line 26, in spam = init_lists('linguistspam/spam/') File "email_classification.py", line 19, in init_lists file_list = os.listdir(folder) FileNotFoundError: [WinError 3] The system cannot find the path specified: 'linguistspam/spam/'

Could you help me please?

surakshashukla commented 5 years ago

Hi, in that line code is expecting the files with the email data in the given path. Since you don't have the data there, you are getting "FileNotFoundError". I used Enron data, you can use the same. I hope this helps.

runchranda commented 5 years ago

Thank you for your respond. I have visited the link you provided for the Enron dataset but I see that there's a lot of directory and I'm not sure which one should I download or use. Also, if I may ask, did you convert all the data in one csv file?