Great work so far, especially the explanations you have in your feature_selection.ipynb and Modeling.ipynb notebooks makes them very easy to follow and shows your ML theory knowledge.
Here is my feedback:
General Suggestions
Usually you don't add your data to git. Instead, you can add a link (e.g. Google Drive) for downloading the data or add a little script for downloading the data from its source. But in this case, I think we can skip this as the data is relatively small.
Put your .csv files in a data/ folder and your notebooks in a notebooks/ folder. Later on, when you refactor your notebooks into Python scripts you can add the scripts in a scripts/ or src/ folder.
Add more explanation in your Data_Preparation.ipynb notebook. Use the markdown cells to add heading for each section and explain what is happening in each section, what your thinking was for doing certain things, and what is your interpretation of the results. For example, you can explain why you have created different distribution plots and what they imply, and how they impact your decisions down the road.
It'd also be good to explain why you're using a different set of features for the Naive Bayes pipeline compared to other classifiers you explore in Modeling.ipynb notebook.
Add some explanations for SVM and Random Forest similar to what you have for Naive Bayes and Logistic Regression in Modeling.ipynb notebook.
It seems like the .csv files in the main directory are clean/processed versions of the .tsv files in liar_dataset/. Correct? If so you can put the original .tsv files in data/raw/ and the processed files in data/processed/.
Great work so far, especially the explanations you have in your
feature_selection.ipynb
andModeling.ipynb
notebooks makes them very easy to follow and shows your ML theory knowledge.Here is my feedback:
General Suggestions
.csv
files in adata/
folder and your notebooks in anotebooks/
folder. Later on, when you refactor your notebooks into Python scripts you can add the scripts in ascripts/
orsrc/
folder.Data_Preparation.ipynb
notebook. Use the markdown cells to add heading for each section and explain what is happening in each section, what your thinking was for doing certain things, and what is your interpretation of the results. For example, you can explain why you have created different distribution plots and what they imply, and how they impact your decisions down the road.Modeling.ipynb
notebook.Modeling.ipynb
notebook.Specific Comments
print()
this line in order to see the result. https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/Data_Preparation.py#L131.csv
files directly as opposed to importingData_Preparation
just to access the dataframes https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/feature_selection.py#L7CountVectorizer
andTfidfTransformer
classes directly fromsklearn
and recreate these instances as opposed to importing them from thefeature_selection.ipynb
notebook https://github.com/kiranrawat/Detecting-Fake-News-On-Social-Media/blob/b0e9aee3cbdc2845a2f0626c060caf16ffcba118/Modeling.py#L95Questions
.csv
files in the main directory are clean/processed versions of the.tsv
files inliar_dataset/
. Correct? If so you can put the original.tsv
files indata/raw/
and the processed files indata/processed/
.