ubco-W2020ST2-data301 / group07-Project

project-group7-data301-2021s created by GitHub Classroom
MIT License
0 stars 1 forks source link

Group 07 - Credit defaults over the pond

Milestones

Details for Milestone are available on Canvas (left sidebar, Course Project) or here.

Research Topic & our teams' interest in the data.

The most important thing to learn from this dataset is the ability to take a large database of raw numbers and analyze and manipulate them in a way that they can be visually summarized and quality metrics can be obtained. The specific metrics is not important, as they will change case by case. But for instance, this dataset is trying to create an accurate prediction for the estimated probability of default. This is huge for internal risk management for any lenders, as these habits would be consistent through different economies. Our team carries heavy interest in economics and curiosity to its workings in different geographies/cultures etc. Creating a dashboard for this database would be an extremely worthwhile endeavor paramount to the prosperity of lending companies. This would allow the dashboard to work dynamically, and always show how the metrics fluctuate with the economy and allow real-time actions to be made in accordance with fluctuations in the trends.

Description of Dataset

This dataset was created by the department of information management and department of civil engineering of the TamKang University of Taiwan. This data is comprised of customer default payments in Taiwan. This dataset consists of demographic parameters, such as education, marital status and age, as well as many quantifiable stats like history of past payments, bill statements and previous payments. This data set was collected from April to September of 2005, therefore, it is quite old data, but this allows us to build predictions and be able to compute their values, and compare them to a more recent analysis. This data and the predictions allow lending companies to better predict the probability of defaults from their lenders. Aside from companies doing a risk analysis on their lenders, this dataset does not have much of a public interest or need for transparency. Data decomposition will be performed using probability of default for the response variable and the independent variable to make a simple linear regression.

Team Members

References

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.