Round 1 comments, suggestions and questions

Great work so far, especially the explanations you have in your feature_selection.ipynb and Modeling.ipynb notebooks makes them very easy to follow and shows your ML theory knowledge.

Here is my feedback:

General Suggestions

Usually you don't add your data to git. Instead, you can add a link (e.g. Google Drive) for downloading the data or add a little script for downloading the data from its source. But in this case, I think we can skip this as the data is relatively small.
Put your .csv files in a data/ folder and your notebooks in a notebooks/ folder. Later on, when you refactor your notebooks into Python scripts you can add the scripts in a scripts/ or src/ folder.
Add more explanation in your Data_Preparation.ipynb notebook. Use the markdown cells to add heading for each section and explain what is happening in each section, what your thinking was for doing certain things, and what is your interpretation of the results. For example, you can explain why you have created different distribution plots and what they imply, and how they impact your decisions down the road.
It'd also be good to explain why you're using a different set of features for the Naive Bayes pipeline compared to other classifiers you explore in Modeling.ipynb notebook.
Add some explanations for SVM and Random Forest similar to what you have for Naive Bayes and Logistic Regression in Modeling.ipynb notebook.

Specific Comments

I think you need to print() this line in order to see the result. https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/Data_Preparation.py#L131
It's better to read in the .csv files directly as opposed to importing Data_Preparation just to access the dataframes https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/feature_selection.py#L7
I think you mean fake or not instead of spam or not https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/feature_selection.py#L50
You mention stemming but you don't apply it anywhere https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/feature_selection.py#L107
Your explanations in this section and the extreme scenarios you consider are perfect. Just make sure you format it a bit nicer so that it's easy to read. https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/Modeling.py#L64
Again it's probably less confusing if you import CountVectorizer and TfidfTransformer classes directly from sklearn and recreate these instances as opposed to importing them from the feature_selection.ipynb notebook https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/Modeling.py#L95

Questions

It seems like the .csv files in the main directory are clean/processed versions of the .tsv files in liar_dataset/. Correct? If so you can put the original .tsv files in data/raw/ and the processed files in data/processed/.

kiranrawat / Detecting-Fake-News-On-Social-Media

Round 1 comments, suggestions and questions #1