pydatadelhi / projects

Call for proposals
2 stars 0 forks source link

Auto Vaidya #1

Open smaranjitghose opened 4 years ago

smaranjitghose commented 4 years ago

Abstract An open-source project for serving deep learning-based solutions in clinical scenarios. Lately, deep learning and its application in computer vision have proven to be highly beneficial for medical scenarios like Covid-19 detection, pneumonia identification, brain tumor segmentation, implant detection, etc. However, taking this work to the production side as a Minimum Value Product has been very slow. This not only reduces the far-reaching impact of the work but also diminishes the opportunities for validation by actual medical practitioners. Hence, this project is meant to serve as an end-end solution for having a web app that anyone can use for automating the diagnosis and prognosis of common problems. Initially we plan to start off with common image classification problems like Pneumonia Detection, Intracranial Hemorrhage Detection and then proceed to even more complex scenarios

Author(s) Smaranjit Ghose Anush Bhatia

How can we reach out to the author(s)? Smaranjit Ghose: contact@smaranjitghose.codes, Linkedin Anush Bhatia: anushbhatia1234@gmail.com, Linkedin

Project type? Web App

Technology used or will be used?

What problem are you trying to solve, and why?

Building a service that ensures the benefits of Artificial Intelligence reaches to medical professionals as well the common people to automate diagnosis & prognosis as well as reduce the costs,time spent and chances of human error.

How do you plan to take the project to v1.0(stable)?

ramantehlan commented 4 years ago

Thank you, @smaranjitghose and @anushbhatia for the proposal! :tada:

I do see a need for a service like this. Building and hosting the front end and backend initially should be relatively simple. We can also crowdfund from the PyData Delhi community for more resources like VM to train the model or domain name etc.

However, the tricky part is to detect the RSNA Pneumonia, AFAIK even the top-rated solutions have a score of less than 0.3. Have you figured out an approach to solve that? It's okay if you haven't, we can also discuss it with the community, but having an initial plan will be helpful. As you mentioned, we can even start with other detections.

IMHO, we can also design it as an algorithm engine, from the developer end we provide just two things.

  1. Input schema: The expected inputs from the user. Using this, it should automatically create a frontend/page for that. Schema of the input will look something like this.

    {
    name: string,
    age: number,
    image: file
    }
  2. Algorithm: It can be in an independent file, used in an endpoint, which takes in the above inputs.

This approach won't require creating a new page for each algorithm manually; instead, it will be automated, which will make it easy and fast to add new algorithms.


I like the idea and will wait for others to put forward their views and discuss before we can move on to setting up the repos and start the work. Feel free to add more information or resources to your application.

smaranjitghose commented 4 years ago

Thanks a lot for the feedback for the model training, the above score you mentioned is it F-1? Usually, when we approach such problems, our target is more on specificity and sensitivity and with certain pre-processing techniques and transfer learning we have achieved 99% on those parameters with a validation accuracy of 98%. For Diabetic Retinopathy, Capsule nets seem to work exceptionally well. Moreover, a lot of data cleaning and image quality assesment is required before model training. And we also wish to take into account heat maps like GradCam, ScoreCams for further validation.

I can definitely understand for clinical scenarios there's a huge difference when we say 98% and 99%. Because we are dealing with human lives here and than 1% can amount to 1000s of lives when the sample size is 1M. But if we can even start with this,definitely with team we can bridge the gaps and something that being into CV research myself I lack is getting external validation from doctors and more. This software will be the gateway to end the imbalace between clinical and technical validations as well.

A few months back we did this for COVID-19 Detection:

Colab Notebook Web app

I'll share the other works as well

ramantehlan commented 4 years ago

Accuracy of 98% is good! We can go ahead with this.

RFC: @shagunsodhani @Dawny33 @manojpandey

MSanKeys963 commented 4 years ago

Hi @smaranjitghose. Thanks for submitting the proposal. The idea sounds promising and I do believe there's a need for deep learning-based solutions in the medical field. The tech stack looks good to me to get started and for releasing v1.0.

The accuracy scores that you've mentioned are great. Is there an initial POC that you can present to us so that everyone can have a look at the pre-processing techniques that you've used? Moreover, we can build the project on top of the work that you've done so far.

MSanKeys963 commented 4 years ago

Please have a look @shagunsodhani and let us know your thoughts.

shagunsodhani commented 4 years ago

Apologies for the late reply from my end.

I want to highlight some non-ML different aspects of the project:

Privacy Concerns. People will be sharing their medical profiles while using this application. What kind of privacy safeguards will be built in? Is it even allowed to crowdsource medical data in this way?

Model Interpretability: How will the doctors/practitioners interpret the results from the app? Will they be able to probe the model or they will just have access to predictions from the model? How can they "trust" the predictions of the model.

Model Bias: The Kaggle contest uses datasets from Radiological Society of North America. Would the model trained on those datasets be useful for users based in India (I assumed the users for the service are people based in India). What are the plans for measuring/quantifying this bias and how to work around it?

It is important to highlight these non-ML aspects as the project emphasizes on building a service for people to use and not just focusing on a specific instance of a ML problem with a specific dataset and specific validation mechanism.

smaranjitghose commented 4 years ago

Apologies for the late reply from my end.

I want to highlight some non-ML different aspects of the project:

Privacy Concerns. People will be sharing their medical profiles while using this application. What kind of privacy safeguards will be built in? Is it even allowed to crowdsource medical data in this way?

Model Interpretability: How will the doctors/practitioners interpret the results from the app? Will they be able to probe the model or they will just have access to predictions from the model? How can they "trust" the predictions of the model.

Model Bias: The Kaggle contest uses datasets from Radiological Society of North America. Would the model trained on those datasets be useful for users based in India (I assumed the users for the service are people based in India). What are the plans for measuring/quantifying this bias and how to work around it?

It is important to highlight these non-ML aspects as the project emphasizes on building a service for people to use and not just focusing on a specific instance of a ML problem with a specific dataset and specific validation mechanism.

For Privacy concerns, I have already mentioned we need to frame an agreement policy and can keep it optional for the users to let us collect it.

And there's certain things that can be permitted easily and certain things which can't be. We don't need to store whom the X-ray came from and have their complete health profile as a metadata.

I have mentioned about Class Activation Maps (like GradCams and ScoreCams) for the "trust" part. And definitely, we'll have a feedback system to get the comments from medical practitioners.

A major purpose of having this open-source is to deal with the bias. If the platform is open for clinical people from all over nation to use, then we can get more varied data. Hence we are also having that optional feature to collect it

For now, the project is intended to be used by medical professionals but with the expansion of more applications will be suitable for use of common folk. And definitely all your concerns are important and fall at the very heart of modern Machine Learning model interpretability and validation