Closed Harry-Long closed 3 years ago
[x] @Harry-Long : prepare data
[x] @verolero86 : kNN
[x] @inzamam1190 : SVM
[x] @TomAllemeier : Random Forest
Looks great to me! Thanks, Harry! Undersampling won't help much I guess, as we'll have a very small training dataset. But, we can test it out and compare it for reference. I can surely go with SVM. I think, finally we need to have one .py file as Todd mentioned, not individual .py files. So after data preprocessing, we can add to that .py file.
No problem. I remembered Todd said instead of individual Jupyter notebooks, he prefers to see everyone committing their .py script for some specific functionality. For example, my get_data.py should be able to be loaded in your script as a module if you guys want to get the input data. And at last, when three models are implemented, we can combine all of our code into one .py file. That's my understanding. Does it make sense? What's your idea?
Got you! I was thinking the same! We can definitely do that.
This sounds good to me! Thanks for adding this @Harry-Long !
I think we can have .py
files depending on usage and just import them for the various needs. I think the main thing Todd wanted to avoid is that we will all have our own independent notebooks but we don't need to have a single .py
file. In the end, we can have one .py
that calls/imports all other modules we create. That way we can show modularity of our code.
No problem @verolero86 ! I already uploaded prepare_data.py and add some instructions in README.md. They are already merged into main branch. Let me know if you have any questions. And we probably want to meet sometime today.
Thanks, @Harry-Long! Time to train some models and see the results!
No problem, @inzamam1190! Let me know if you encounter problems when reading the data.
Hi y'all, @verolero86 @inzamam1190 @TomAllemeier, I uploaded a template doc file for our final report, named as 'dse511_final_report.docx'. Let me know if you have any quesitons.
Hey, @Harry-Long, saw the report doc. looks great. I think we should also add our group name in there - 'Lancaster Barnstormers.' Also, in your prepare_data.py file, please look at the print statements. The variables are wrong in there. Just a simple fix.
Thanks for your reminding @inzamam1190. I will check and fix it.
No problem, @Harry-Long!
Done! @inzamam1190 Just let me know if there still exist bugs.
@Harry-Long, great! Sure, I'll let you know.
Great - thanks @Harry-Long! Would you rather use a Google Doc for the report instead of Word (docx)? That way we can co-edit real-time.
That sounds good! I can do that. @verolero86
Hey everyone, just wanted to know if you all still wanted to meet tonight?
I just added svm.py in the repo. Also, updated the README with instructions. Please check it out and let me know if there is any bug.
@TomAllemeier sorry, just saw this. Maybe we can figure out a common time later.
All done with experiments, analysis, and results. Yay!
I think we will go with the loan classification problem. Our objective is to predict whether a loan is good or bad based on multiple features associated with the loan, account, and client. In view of the class imbalance problem, oversampling (SMOTE), undersampling will be implemented to compare with using original data. Here are some main tasks that I can think of. Feel free to add anything you come up with.