udacity / ud120-projects

Starter project code for students taking Udacity ud120
1.62k stars 5.66k forks source link

No module named email_preprocess error when using Atom #185

Open meehaw1337 opened 6 years ago

meehaw1337 commented 6 years ago

I am currently experiencing some difficulties with using Atom to run my python code, that otherwise works when launched through the command prompt. For those unfamiliar with Udacity's Introduction to Machine Learning, the "email preprocess" module is located in "...\naive_bayes\tools" directory.

Code:

import sys
from time import time
sys.path.append("../tools/")
from email_preprocess import preprocess

Whenever i run the nb_author_id.py file through the command prompt with the following command:

python2 nb_author_id.py in the D:\Misiek\Pulpit\python\ud120-projects-master\naive_bayes directory, it works fine. But, if want to run the nb_author_id.py file through Atom (using atom-runner) I get the error message:

Traceback (most recent call last):
  File "D:\Misiek\Pulpit\python\ud120-projects-master\naive_bayes\nb_author_id.py", line 17, in <module>
    from email_preprocess import preprocess
ImportError: No module named email_preprocess

Any ideas why it works through the command prompt, but not through Atom?

Mathanraj-Sharma commented 5 years ago

These python codes are written on python2 make sure you are using the same version

Mathanraj-Sharma commented 5 years ago

Modified email_preprocess.py for python3 `#!/usr/bin/python

import pickle

import cPickle

import numpy

from sklearn.model_selection import cross_val_score from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_selection import SelectPercentile, f_classif from sklearn.model_selection import train_test_split

def preprocess(words_file = "../tools/word_data.pkl", authors_file="../tools/email_authors.pkl"): """ this function takes a pre-made list of email texts (by default word_data.pkl) and the corresponding authors (by default email_authors.pkl) and performs a number of preprocessing steps: -- splits into training/testing sets (10% testing) -- vectorizes into tfidf matrix -- selects/keeps most helpful features

    after this, the feaures and labels are put into numpy arrays, which play nice with sklearn functions

    4 objects are returned:
        -- training/testing features
        -- training/testing labels

"""

### the words (features) and authors (labels), already largely preprocessed
### this preprocessing will be repeated in the text learning mini-project
authors_file_handler = open(authors_file, "rb")
authors = pickle.load(authors_file_handler)
authors_file_handler.close()

words_file_handler = open(words_file, "rb")
word_data = pickle.load(words_file_handler)
words_file_handler.close()

### test_size is the percentage of events assigned to the test set
### (remainder go into training)
features_train, features_test, labels_train, labels_test = train_test_split(word_data, authors, test_size=0.1, random_state=42)

### text vectorization--go from strings to lists of numbers
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5,
                             stop_words='english')
features_train_transformed = vectorizer.fit_transform(features_train)
features_test_transformed  = vectorizer.transform(features_test)

### feature selection, because text is super high dimensional and 
### can be really computationally chewy as a result
selector = SelectPercentile(f_classif, percentile=10)
selector.fit(features_train_transformed, labels_train)
features_train_transformed = selector.transform(features_train_transformed).toarray()
features_test_transformed  = selector.transform(features_test_transformed).toarray()

### info on the data
print ("no. of Chris training emails:", sum(labels_train))
print ("no. of Sara training emails:", len(labels_train)-sum(labels_train))

return features_train_transformed, features_test_transformed, labels_train, labels_test`
Sarita19 commented 5 years ago

Hello @Mathanraj-Sharma , I tried using your code for email_preprocess.py. However, I am still getting error(s)

tonnystark commented 5 years ago

Hello @Sarita19 , It's worked for me when I've modified code like this: def preprocess(words_file = "./tools/word_data.pkl", authors_file="./tools/email_authors.pkl")

zuhaldanyildiz commented 3 years ago

Hi,

The issue I keep having is this:

I tried both with the original code of email_preprocessing and in that file when I run the code I don't get any error (just a few and fixed them) but when I run and I debugged it too, no issues! I also tried to replace with Python 3 version suggested earlier just to be on the safe side, and that also worked with no problems whatsoever.

The real issue occurs when you tried to run it in the nb_author_id file. someone suggested to keep both email_preprocessing and nb_author_id in the folder and I did - IT STILL DOESN'T WORK!

Honestly, I know that the source code is written in Python 2; however, I don't think it's smart to install Python 2 at all. It conflicts with other code projects and other libraries.

I have been trying to solve this issue for the past 3 days and I get really tired of it. Can everyone really make it work?

Thanks!

trsvchn commented 3 years ago

Hi @zuhaldanyildiz!

Looks like this repository is not maintained anymore.

Feel free to check out my fork of ud120. I refactored and ported most of the code from this repo into Python 3 and Jupyter notebooks,

trsvchn/ud120-projects-py3-jupyter

zuhaldanyildiz commented 3 years ago

Hi @trsvchn,

Thanks for help, I'll go ahead and check it! I'm also glad it's in Jupyter Notebook. I can't even entirely interpret the data I'm dealing with in PyCharm.

Thanks again!