jvmncs / default-risk

1 stars 8 forks source link

Data exploration epic #10

Open jvmncs opened 6 years ago

jvmncs commented 6 years ago

The goal here is to gather the insights we need to make informed choices about data processing and downstream modeling tasks. Although it's not super exciting work, everything else depends on this being completed. It's also highly parallelizable, which means that we should be able to get it done fairly quickly if we have enough people volunteer.

There's an issue for performing EDA on each table. All of the issues below except for #3 follow the same basic workflow, while #3 consists of gathering research into existing kernels on Kaggle. If you're going to pickup an issue, please assign it to yourself so we don't end up repeating work!

3 application_{train/test}.csv

4 bureau.csv

5 bureau_balance.csv

6 previous_application.csv

7 POS_cash_balance.csv

8 credit_card_balance.csv

9 installments_payments.csv