Open rdamarapati opened 11 years ago
Technologies : These Models should be integrated into Hadoop. We will use either python or R for the analytics on top of Hadoop. As far as technologies are concerned, you may need familiarty with python, R, Hive, Pig and Hadoop. You are open to suggest and use other technologies to accomplish the objective.
Objective : Objective of the credit analytics framework is to provide insight and transparency into the measurement and management of credit risks of portfolios of U.S. residential mortgage loans by calculating following metrics
• Probability of Default (PD) • Loss Given Default (LGD) • Exposure at Default (EAD)
Milestones : Following are indicative milestones, we will refine and iterarate over these steps as we go.
1) Identify, Gather and Load Sample Test Data Required to perform the credit analytics calculations 2) Identify, Gather and Load Training data for the Probability of Default 3) Calculate "Probability of Default"(PD) for an individual loan 4) Calibrate and fine tune the predictive model 5) Repeat steps 1-4 for Loss Given Default (LGD) and Exposure at Default (EAD)
Credit Analytics has applications to risk-management, stress testing, portfolio-construction, and capital allocation. Framework should provide a common and coherent framework for the analysis of whole-loan portfolios or, when integrated with a waterfall tool, of RMBS transactions. Framework should accommodate both the analysis of newly originated portfolios and the monitoring of seasoned loan pools
In the current project, after loading and linking the data, we were about to find out defaulters and the like. And then use this dataset to train some predictive models for future use, say 2012 data or 2013 data.
Now, you want these models to be built in integration with Hadoop?
Or they will be standalone models which we just take some results from the queries supplied to Hadoop and work on them?
Essentially, I request you to give me a full requirements doc (bullet points as in first mail would suffice) and the stage-wise deliverables (including what input we'd be giving and the processing and output you expect) for the current project, so that I can read up the required technologies whenever I get time.