Loan Prediction / 30_Days_of_Python #300

Title: Loan Prediction System Using Multiple Machine Learning Models #300

Description:

This pull request introduces a comprehensive Loan Prediction System that leverages multiple machine learning algorithms to predict the approval status of loans. The system is designed to provide a data-driven approach for assessing loan applications based on various features such as gender, marital status, education level, income, and credit history. Below is an in-depth breakdown of the system and its components.

Key Features:

Data Preprocessing:
- Handling Missing Values: The dataset contained missing values, particularly in the Credit_History feature. The missing values were imputed based on the Loan_Status to maintain the integrity of the data.
- Encoding Categorical Variables: All categorical features such as Gender, Married, Dependents, Education, Self_Employed, and Property_Area were encoded using Label Encoding. This step was crucial to convert categorical data into a numerical format that could be fed into the machine learning models.
- Feature Scaling: Continuous features like ApplicantIncome and CoapplicantIncome were scaled using MinMaxScaler to normalize the data. This scaling ensures that the model treats all features with equal importance, avoiding bias towards features with larger numerical ranges.
Exploratory Data Analysis (EDA):
- Visualizations: Various plots were created to understand the data distribution and relationships between features. Count plots were used to observe the distribution of categorical variables, while box plots and heatmaps were employed to explore the relationships between continuous variables and the target variable (Loan_Status).
- Insights: Through EDA, valuable insights were gathered, such as how Credit_History is a significant determinant in loan approval and how factors like being married or having fewer dependents may increase the chances of loan approval.
Model Training:
- Model Selection: Multiple models were trained on the preprocessed dataset, including:
  - Logistic Regression
  - Support Vector Machine (SVM)
  - Decision Tree
  - Random Forest Classifier
  - Naive Bayes (Multinomial NB)
  - K-Nearest Neighbors (KNN)
- Training Process: Each model was trained using the train_test_split method, where 80% of the data was used for training, and 20% was reserved for testing. This split ensures that the model's performance can be evaluated on unseen data.
Model Evaluation:
- Evaluation Metrics: The models were evaluated based on several key metrics:
  - Accuracy: The ratio of correctly predicted instances to the total instances.
  - F1 Score: The harmonic mean of precision and recall, providing a balance between the two metrics.
  - Log Loss: Measures the uncertainty of the predictions, penalizing false classifications.
  - Precision: The ratio of correctly predicted positive observations to the total predicted positives.
  - Recall: The ratio of correctly predicted positive observations to all actual positives.
- Comparison: After evaluating all models, the SVM model was selected as the final model due to its superior performance across most metrics, especially in terms of accuracy and F1 score.
Final Model and Serialization:
- The Support Vector Machine (SVM) model was saved using Python's pickle module. The serialized model (model_svm.pkl) can be easily loaded for future predictions, making the system ready for deployment in a real-world scenario.

Potential Applications:

This Loan Prediction System can be integrated into financial institutions' loan approval processes to automate and enhance decision-making. By providing a probabilistic prediction of loan approval, the system can help reduce the time and effort involved in manual assessments while maintaining a high level of accuracy.

How to Use:

Clone the Repository:
- Clone the repository to your local machine using the following command:
```
git clone <repository-url>
```
Install Dependencies:
- Navigate to the project directory and install the required Python libraries using:
```
pip install -r requirements.txt
```
Run the Project:
- Execute the Jupyter notebook or Python script provided in the repository to preprocess the data, train the models, and evaluate their performance.
Deploy the Model:
- Load the pre-trained SVM model (model_svm.pkl) and use it to predict loan approvals on new data.

Conclusion:

This project provides a robust, scalable, and efficient solution for predicting loan approvals using machine learning. It encapsulates the entire machine learning pipeline, from data preprocessing to model training and evaluation, and culminates in a high-performing predictive model. This pull request aims to merge the complete Loan Prediction System into the main branch, making it accessible for further development, deployment, or integration into larger financial systems.

Please review the changes and provide feedback or approve the merge if everything looks good.

Screenshot 2024-08-14 050147 Screenshot 2024-08-13 052003

jitacm / -30DaysDevChallenge-