Define and split our task

Harry-Long commented 3 years ago

I think we will go with the loan classification problem. Our objective is to predict whether a loan is good or bad based on multiple features associated with the loan, account, and client. In view of the class imbalance problem, oversampling (SMOTE), undersampling will be implemented to compare with using original data. Here are some main tasks that I can think of. Feel free to add anything you come up with.

Prepare the input data (one person, I can volunteer for this, deliverable: a get_data.py file)
- merge features for discount, for clients;
- recategorize A, B, C, D as 0/1 (bad/good);
- deal with class imbalance (oversample: SMOTE, undersample, None)
Decide classifiers to use (3 classifiers would be good to split to three of us, deliverables: 3 run_{classifier}.py files)
- kNN, SVM, Random Forest? (subject to change, if you want)
- train/test 0.67:0.33
- roc curve, auc score ......

Harry-Long commented 3 years ago

[x] @Harry-Long : prepare data
[x] @verolero86 : kNN
[x] @inzamam1190 : SVM
[x] @TomAllemeier : Random Forest

inzamam1190 commented 3 years ago

Looks great to me! Thanks, Harry! Undersampling won't help much I guess, as we'll have a very small training dataset. But, we can test it out and compare it for reference. I can surely go with SVM. I think, finally we need to have one .py file as Todd mentioned, not individual .py files. So after data preprocessing, we can add to that .py file.

Harry-Long commented 3 years ago

No problem. I remembered Todd said instead of individual Jupyter notebooks, he prefers to see everyone committing their .py script for some specific functionality. For example, my get_data.py should be able to be loaded in your script as a module if you guys want to get the input data. And at last, when three models are implemented, we can combine all of our code into one .py file. That's my understanding. Does it make sense? What's your idea?

inzamam1190 commented 3 years ago

Got you! I was thinking the same! We can definitely do that.

verolero86 commented 3 years ago

This sounds good to me! Thanks for adding this @Harry-Long ! I think we can have .py files depending on usage and just import them for the various needs. I think the main thing Todd wanted to avoid is that we will all have our own independent notebooks but we don't need to have a single .py file. In the end, we can have one .py that calls/imports all other modules we create. That way we can show modularity of our code.

Harry-Long commented 3 years ago

No problem @verolero86 ! I already uploaded prepare_data.py and add some instructions in README.md. They are already merged into main branch. Let me know if you have any questions. And we probably want to meet sometime today.

inzamam1190 commented 3 years ago

Thanks, @Harry-Long! Time to train some models and see the results!

Harry-Long commented 3 years ago

No problem, @inzamam1190! Let me know if you encounter problems when reading the data.

Harry-Long commented 3 years ago

Hi y'all, @verolero86 @inzamam1190 @TomAllemeier, I uploaded a template doc file for our final report, named as 'dse511_final_report.docx'. Let me know if you have any quesitons.

inzamam1190 commented 3 years ago

Hey, @Harry-Long, saw the report doc. looks great. I think we should also add our group name in there - 'Lancaster Barnstormers.' Also, in your prepare_data.py file, please look at the print statements. The variables are wrong in there. Just a simple fix.

Harry-Long commented 3 years ago

Thanks for your reminding @inzamam1190. I will check and fix it.

inzamam1190 commented 3 years ago

No problem, @Harry-Long!

Harry-Long commented 3 years ago

Done! @inzamam1190 Just let me know if there still exist bugs.

inzamam1190 commented 3 years ago

@Harry-Long, great! Sure, I'll let you know.

verolero86 commented 3 years ago

Great - thanks @Harry-Long! Would you rather use a Google Doc for the report instead of Word (docx)? That way we can co-edit real-time.

Harry-Long commented 3 years ago

That sounds good! I can do that. @verolero86

TomAllemeier commented 3 years ago

Hey everyone, just wanted to know if you all still wanted to meet tonight?

inzamam1190 commented 3 years ago

I just added svm.py in the repo. Also, updated the README with instructions. Please check it out and let me know if there is any bug.

inzamam1190 commented 3 years ago

@TomAllemeier sorry, just saw this. Maybe we can figure out a common time later.

verolero86 commented 3 years ago

All done with experiments, analysis, and results. Yay!

inzamam1190 / Lancaster_Barnstormers_DSE511

Define and split our task #7