Predicting Early Hospital Readmission for Diabetic Patients Using XGBoost

TheUsefulNerd commented 1 month ago

Problem Description: The project aims to predict whether diabetic patients will be readmitted to the hospital within 30 days of discharge. Many hospitals struggle with managing diabetes properly, which leads to frequent readmissions. These readmissions increase costs for hospitals and worsen patient health. By predicting which patients are likely to be readmitted, hospitals can take preventive measures, improving patient care and reducing unnecessary costs.

Model Description: We will use a machine learning model called XGBoost, which is good at handling complex data and making accurate predictions. XGBoost is chosen because it performs well on medical data and can deal with situations where there are more non-readmitted patients than readmitted ones (class imbalance). The model will be trained on patient data, including demographics, medical history, and treatment details, to predict the likelihood of readmission. We will also use methods like SMOTE to balance the dataset and make the model more accurate.

Estimated Time for Completion: I will be taking 2 weeks of time to finish this project.

Why?: I want to focus heavily on data pre-processing for much better results.
Then, I will move on to Model development.

Expected Outcome: The model will help predict which diabetic patients are at high risk of being readmitted to the hospital within 30 days. This will allow hospitals to intervene early, reducing readmission rates, improving patient health, and lowering costs.

Pranshu-jais commented 1 month ago

@TheUsefulNerd Two weeks is more than enough time for this project. Let’s aim to get it done in one week instead.

TheUsefulNerd commented 1 month ago

Ok, I will complete the project within a week and submit a pull request.

TheUsefulNerd commented 1 month ago

Is it necessary to use a .py file or can I use .ipynb files? For model.py and predict.py, because I will be adding notebooks where I will train my model.

Pranshu-jais commented 1 month ago

Is it necessary to use a .py file or can I use .ipynb files? For model.py and predict.py, because I will be adding notebooks where I will train my model.

Yes, it is necessary to use the model.py and predict.py files for model definition and predictions to maintain consistency within the project structure. You can add your training and experimentation in Jupyter notebooks, place them in a dedicated notebooks/ directory. This way, we keep the code modular and organized while allowing for interactive development. Let me know if you have any other queries.

Pranshu-jais commented 1 month ago

@TheUsefulNerd assigned

TheUsefulNerd commented 1 month ago

Hey @Pranshu-jais , I have been working on this model since a few days. I have got 60% accuracy using XGBoost, the highest among the models I used. Can I take 2 more days to improve the model's accuracy and then submit the PR? The dataset has around 1 lakh rows and 50 columns with numerical, categorical and missing values with outliers....so need time to improve the dataset.

Pranshu-jais commented 1 month ago

@TheUsefulNerd Yes, you can .

TheUsefulNerd commented 1 month ago

@Pranshu-jais, @yashasvini121, I’ve worked on the dataset and achieved an accuracy of 89% using the Random Forest model. However, I’m facing challenges with the recall and precision metrics. Despite oversampling the data and implementing various resampling techniques, I’m still not getting the desired results. I’ve also consulted additional resources, but they address similar issues. Do you think it's acceptable for me to submit the PR?

yashasvini121 commented 1 month ago

Yes, you can submit, but make sure your follow the current project structure and provide model_details fxn

TheUsefulNerd commented 1 month ago

Ok, Thanks.

TheUsefulNerd commented 1 month ago

Last 2 question, when do you assign the level 1,2,3 label to the repos? and should I add the model_details fxn in the model.py file? @yashasvini121

yashasvini121 commented 1 month ago

The levels are assigned after the pr is merged.

Yes, you could do that, if you face any difficulty then you can instead add the proper model details fxn in your notebook as well.

TheUsefulNerd commented 1 month ago

Hey, I am unable to load the prediction form page on streamlit. When I am running my file or any file from "pages" folder, it shows that page_handler is not a directory or file: "ModuleNotFoundError: No module named 'page_handler'". Do you know howto fix this error? @Pranshu-jais @yashasvini121

yashasvini121 commented 1 month ago

@TheUsefulNerd, Sorry, but what do you mean by "running a file from the pages folder"? Additionally, you could push your work to your fork so that we can better understand your question.

To run the project as a whole, use the command streamlit run App.py.

TheUsefulNerd commented 1 month ago

Apparently my model.joblib file is larger than 100 mb which I don't know how.....so its not letting me push the commit at all. @yashasvini121

yashasvini121 commented 1 month ago

You will need to compress your joblib file. Consider using the following command: joblib.dump(your_data, 'your_data_file.joblib', compress=<2,3 etc>)

TheUsefulNerd commented 1 month ago

@yashasvini121 I have been trying to solve the issues for the last 7 hours and in the end, I get this:

git push origin master batch response: @TheUsefulNerd can not upload new objects to public fork TheUsefulNerd/predictive-calc error: failed to push some refs to 'https://github.com/TheUsefulNerd/predictive-calc.git'

Then on the streamlit page all the models are working but mine shows:

ModuleNotFoundError: No module named 'model'

I have checked the file order, I have checked the import statements, the functions used, and variable names...... I tried codes recommended by ChatGPT to enhance it but still got the same errors. I am unable to understand what to do.

I am sorry to ask so many questions, but I really don't know what to do here.

yashasvini121 commented 1 month ago

I can’t give a definitive answer without more details, so please share the error screenshot next time for better clarity. However, based on my understanding:

Your master branch might be behind, which might cause merge conflicts. So try this:

Push your changes to a new branch on your fork using:
```
git push origin HEAD:new-branch
```
Verify the size of your pickle file—it must be less than 100MB.
Regarding the model import error, I assume you’re trying to import model.py into predict.py (or another file). To fix this, ensure you are using the correct import syntax:
```
from models.<your-folder-name>.model import ...
```
For example:
```
from models.house_price.model import x
# Instead of:  from model import x
```
Hope this works, let me know if you have any other issue.

TheUsefulNerd commented 1 month ago

@yashasvini121 Ok, so I deleted the complete repo from my device and cloned it again and made changes again. I am currently facing this issue: Screenshot 2024-10-12 234712

my code:

I did not change any file location.

yashasvini121 commented 1 month ago

@TheUsefulNerd, could you also mention the command you ran?

Silly question, but I don't see any issues otherwise. If it still doesn't work, I'll clone it and try it myself.

yashasvini121 commented 1 month ago

Because it looks like you have kept the repo in a predictive-calc folder, make sure you run streamlit run app.py and not streamlit run predictive-calc/app.py

TheUsefulNerd commented 1 month ago

@yashasvini121 I used the same command streamlit run app.py. I directly clicked on run button to run the .py file.

Also I successfully pushed all the code to my forked repo. Can you have a look at it once and tell me where I am going wrong? I pushed my code to "new-branch" Here is the link:

https://github.com/TheUsefulNerd/predictive-calc.git

yashasvini121 commented 1 month ago

You cannot click the run button to run the files i.e. you cannot run the files individually. You need to run the whole app. So try streamlit run app.py and then check if your page is working properly in the website. @TheUsefulNerd

TheUsefulNerd commented 1 month ago

Did that too:

yashasvini121 commented 1 month ago

Well, it's a spelling mistake error: Line No. 155, it should be Diabetes Readmission Prediction

TheUsefulNerd commented 1 month ago

ok now I can see the page there, but a new error occurred again, I was facing this error since 2 days: Screenshot 2024-10-13 004825

yashasvini121 commented 1 month ago

Instead of from model import DiabetesModel in predict.py line 1, write:

from models.Diabetes_Readmission_Prediction.model import DiabetesModel

TheUsefulNerd commented 1 month ago

Wow! I did the same thing few hours ago and it showed me error and now it works!! I pushed the changes too!!

Thank you so much!!

https://github.com/TheUsefulNerd/predictive-calc.git What is the next step?

yashasvini121 commented 1 month ago

Welcome, Create a model_details fxn as well, either in the notebook or in the model.py. After that create the pr for review.

TheUsefulNerd commented 1 month ago

:) I got another error now:

TheUsefulNerd commented 1 month ago

Hoorah!! Fixed it. No more errors! Submitting the PR for verification. Thanks a lot @yashasvini121

TheUsefulNerd commented 3 weeks ago

Hi @yashasvini121 , So, I have been trying to improve my model for the last 5 days. I was able to improve it. Now, the problem is with the model file size (.pkl). I used compress = 9 and then it came as 169.3 mb which exceeds the 100 MB limit of github. I used ChatGPT and got this answer as an alternative:

Store the Model Externally (Preferred) One common practice is to store large files like models externally (e.g., on a cloud storage service) and reference them in your repository. This way, your repository stays light, and contributors can download the model if needed.

Google Drive / Dropbox / AWS S3 / Azure Blob Storage: Upload the model to one of these services, then provide a link in your repository’s README or code to download it when necessary.

will this be feasible?

yashasvini121 / predictive-calc

Predicting Early Hospital Readmission for Diabetic Patients Using XGBoost #17