rubyforgood / Flaredown

Flaredown web app and API
http://www.flaredown.com
GNU General Public License v3.0
39 stars 15 forks source link

Chronic illness oracle #211

Closed lmerriam closed 3 years ago

lmerriam commented 7 years ago

We want to use the "diagnosing" algorithm Graydyn created, and continue to train it. We also want to get the word out about Flaredown. The "chronic illness oracle" is a shareable experiment that analyzes users' symptoms and provides a list of possible conditions they might come from. It is intended to be used without a Flaredown account and therefore needs to be accessible by unauthenticated users.

@Graydyn is working on a quick API that accepts the user's symptoms and other data, and returns the output of his diagnosing algorithm.

1. Static welcome screen

screenshot 2017-03-05 23 31 46

2. Gather necessary data

screenshot 2017-03-05 23 47 01

3. "Analyzing" screen

screenshot 2017-03-05 23 35 15

4. Results and sharing iphone 7

mpugach commented 7 years ago

what is the API? should we share the results or base link only?

lmerriam commented 7 years ago

@Graydyn can you post API here when ready?

@mpugach sharing the results would be ideal. I was worried that might be complicated and require us to save static result pages, but could it be as simple as a URL parameter?

If sharing results is feasible, I will update mockups with a button to "start over" so that users that follow share links can try the oracle for themselves.

Graydyn commented 7 years ago

Hi Guys,

You don't really need to credit me directly on the page. Thanks for the thought though. I can't really use height and weight, as we don't have the info for all the users that I trained on. Could be nice for a future release. Yup I'll include confidence interval.

Should have some swagger docs ready in a couple days.

mpugach commented 7 years ago

@lmerriam it could be URL parameters.

It seems logical for me to run it as a separate service. Like a frontend for @Graydyn`s API. Do you want it to be built in FlaredownEmber-2? We could start another project for this if you like.

Do you plan another domain?

lmerriam commented 7 years ago

It is definitely a separate experiment, which would make sense in its own domain or a subdomain of flaredown.com. But I also was thinking that this flow uses some of the components from the main Flaredown app like the power select with autocomplete and adding symptoms, so if it's easier to make it part of FlaredownEmber-2 then we should do that. If not then a separate ember app is fine.

This is intended to be as quick and hacky as possible -- I don't want it to take too much of your time away from the main app. Feel free to leave all the CSS to me if that will save time.

mpugach commented 7 years ago

building it into same project to reuse general styles, some models and mixins

lmerriam commented 7 years ago

@mpugach another thought: how much effort would it be to pass the information the user has already entered (age, sex, country, symptoms and the conditions list after they have corrected it) to the signup flow? Because they have already entered a lot of information, this might be a super easy way to sign users up. We could provide a button that would take the user into account creation with all this data pre-filled.

Although even if we decide this is feasible, we should wait to see whether the oracle is popular before we implement pushing people to signup. Not worth it if no one even likes it.

mpugach commented 7 years ago

That is a great idea! It is feasible and should take several hours.

mpugach commented 7 years ago

hi @Graydyn, so are there some API docs?

Graydyn commented 7 years ago

Sorry for the delay Maksym. I'll try and have the docs up tomorrow morning.

lmerriam commented 7 years ago

@mpugach also could we do an export of the latest data for Graydyn so he has the most up to date dataset?

lmerriam commented 7 years ago

Updated mock of the results screen: fd_chronic_illness_results

How will sharing work from the mobile app? We should test to make sure it opens the correct app or website.

Graydyn commented 7 years ago

Here is the API/doc: http://34.207.197.147:5000/#!/default/post_generic

It's on my lightsail. Should be good to use in prod as long as we aren't exceeding something to the tune of 20000 requests per month. Let me know if you start having performance issues.

If you have any requests for changes to the contract or json let me know. Algorithm has some bugs that I'm working on still, so don't worry if you see it returning some weird results. How long do I have before you guys are thinking of releasing?

mpugach commented 7 years ago

Thank you.

I have to see the API to understand how to go on with frontend. Then Logan will have to help with styles.

I thought it could take a day if there will be no questions. Hard to say need to dive in.

For some reason there is "This site can’t be reached" can you check?

Graydyn commented 7 years ago

Oops, sorry about that. Looks OK now, give it a try. Let me know if that happens again, I haven't been using lightsail for long so I tend to break things when I make changes.

mpugach commented 7 years ago

Ok, I can see it, thank you

mpugach commented 7 years ago

is there an endpoint to provide user's feedback (learning)?

Graydyn commented 7 years ago

Nope, you're going to need to store that feedback somewhere. The API is driven by occasional CSV dumps. So it's parameters are only updated when a new dump gets sent out.

mpugach commented 7 years ago

So I need to store user input and feedback. Should I store API response?

Graydyn commented 7 years ago

I don't see why you would. Unless you're allowing the user to save the results or something.

mpugach commented 7 years ago

I want to save only that data what is needed for your further analysis and trying to understand what it is.

Graydyn commented 7 years ago

I see. Nope I don't need the results, user's input and feedback is good.

mpugach commented 7 years ago

@Graydyn do we need user country? It is present on @lmerriam pictures but do not see it in the API docs

mpugach commented 7 years ago

@Graydyn could you please set Access-Control-Allow-Origin: * header for OPTIONS response? Can't make API request from browser without it.

Graydyn commented 7 years ago

No need to pass me country, it didn't help in the predictions so I'm no longer using it. I just added that header and pushed latest. Let me know how it goes.

mpugach commented 7 years ago

Thank you it works now.

But the response body contains string instead of JSON: "[{\"confidence\": 74.620718162982527, \"name\": \"Asthma\"}, {\"confidence\": 95.745563106361203, \"name\": \"Crohn's disease\"}, {\"confidence\": 99.842699734806388, \"name\": \"Menorrhagia\"}]"

should be:

[{"confidence": 74.620718162982527, "name": "Asthma"}, {"confidence": 95.745563106361203, "name": "Crohn's disease"}, {"confidence": 99.842699734806388, "name": "Menorrhagia"}]

mpugach commented 7 years ago

@lmerriam should we remove country?

lmerriam commented 7 years ago

Yep, no need to ask for it if it's not necessary for the API. We can wait until the user decides they want to sign up.

Very interesting that that is the case! I was wondering whether certain countries tended toward certain conditions or reported differently. Sounds like even if there aren't differences they must not be dramatic.

Graydyn commented 7 years ago

Ya, I expected the country of origin to have more of an impact, and it still might in the future. I think maybe since it's mostly USA, Canada, Australia, Great Britain, maybe these countries have similar sorts of conditions. This just got me thinking about how chronic illness in general seems to be associated with first world countries. I wonder if there is something to that.

Graydyn commented 7 years ago

Oh, I just fixed the json formatting.

lmerriam commented 7 years ago

This just got me thinking about how chronic illness in general seems to be associated with first world countries. I wonder if there is something to that.

Dude definitely. In fact it's a known thing, we just don't necessarily know why. Theres diet, caloric intake, cleanliness, activity level, sleep schedule, etc etc. Would be really cool if we could scrutinize this stuff, but tough to do since everyone on our platform is thoroughly first-world.

mpugach commented 7 years ago

@Graydyn thank you

mpugach commented 7 years ago

@lmerriam my previous statement about URL parameters was wrong, since API only able to guess conditions and not capable to store feedback, we need to store the results to our DB

besides other things (unchecked boxes in the issue) we need to implement the followng:

mpugach commented 7 years ago

@lmerriam check the staging "/oracle"

Graydyn commented 7 years ago

I've got the API working in a way that I'm happy enough with it to call it in beta. Feel free to start using it in the app whenever you like, but warn me before going live and please try to avoid going live during the month of April as I'll be out of town.

If it doesn't recognize the symptoms listed it returns an empty array so make sure you handle for that. I limited it to returning five conditions, it returns the top 5 most likely, unless less than 5 have a non-zero confidence. 5 was selected arbitrarily and can be changed if you like.

I feel like I've gotten it to a good balance between stating the obvious and giving results that on the surface seem unrelated to the symptoms. For example listing a few Crohn's symptoms will result in predicting Crohn's, but is also somewhat likely to list something like depression with a lower confidence. This is because people listing Crohn's also have a tendency towards listing depression and anxiety. I'm hoping the effect will be less "Why did it return something unrelated to my symptoms?" and more "Wow, how did it know I have depression?"

I'm recording a few potential future improvements here:

Synonymous symptoms caused a lot of data to get tossed and confuse the algo. Probably the biggest thing we could do to improve accuracy would be to make similar annotations for symptoms to what Peer made with conditions. I don't think I have the bio-med knowledge to do this myself.

Some sacrifices had to be made in terms of accuracy in order to support being in a lower memory environment. There is the potential to get higher accuracy by switching to a Random Forest but the memory usage is around 4 gigs which I found unacceptable for prod.

Since RF works really good, Gradient Boost would probably work even better and be more memory effecient. But the sci-kit implementation doesn't support multi-label. It's open source so we could fix it, but if I do take this on, my time estimation to complete the task is ~7 months.

lmerriam commented 7 years ago

@mpugach so cool to see this in action! I will work on styles before deploying. One thing I noticed is that it doesn't seem to offer the full list of symptoms in the database -- I THINK it's only showing the ones we preloaded and none of the user-generated symptoms. Intentional?

Out of curiosity how do you plan to implement the ability to share without letting other users edit the results? Is it based on a cookie or do you give them a different URL?

@Graydyn this is amazing, seriously. So cool to see. Do you happen to have an idea of how many datapoints you'd need to be confident in the output? I can use that as a goal for how much traffic to drive to this.

I'd love to crank it up to 10 results, I think we'll have a better chance to hit something useful to the user. Note: I managed to get more than 5 results before, and sometimes I only get 1 in cases where it doesn't seem likely (for instance if you put in just "Headache" and "Fatigue" you get CFS at 97% and nothing else) so I wonder if there's something funky going on with how many results it decides to output?

lmerriam commented 7 years ago

@mpugach on mobile the "sex" select input has the same problem as "country" did in onboarding recently. We could also turn it into radio buttons if that's easier.

Graydyn commented 7 years ago

Number of samples to get reliable output varies depending on the variability of the features. So in our case it depends on how consistent the symptoms of that disease are. So for example Sjorgren's(sp?) Is really easy because everybody who has it reports dry mouth and dry eyes. While Crohns or Celiac are trickier because the symptoms vary wildly between patients. But the cutoff I've set I think around 20 if I remember correctly. If less than that many users are reporting a condition I don't try to predict it. This still leaves us with I think around 150 conditions. Not near my machine right now but I could check later if you need exact figures.

We're those results from the latest version? I only just uploaded last night. I usually expect too many results from a small symptom list rather than too few. I can take a look later, I may have broken something.

lmerriam commented 7 years ago

@Graydyn I believe I've seen the same result both before and after last night. Right now if you go to staging it only gives two options for symptoms: Fatigue and Headache. If you use them they should return a single result, which seems unlikely because those are two super common symptoms. Also @mpugach is it expected that there are only two symptoms available to select on the staging version right now? I see more in my local setup.

mpugach commented 7 years ago

I'm sorry for missclick

@lmerriam I used the same endpoint for finding symptoms that is used during checkout. The algorithm shows only preloaded symptoms or popular ones. There are 31 symptoms on staging and none of them are preloaded. But Fatigue and Headache are popular. I guess it was developed so in order to prevent spamming other users with invalid data. But I don't know the story.

"Sex" dropdown issue should be auto fixed after merging into master.

I plan to write some token into the cookie and match it with saved record. So URL will be the same.

lmerriam commented 7 years ago

@mpugach yep that system is intended to keep users from seeing weird results that only 1 or 2 other people have ever used. It's supposed to only show items that have been independently added by at least 3 people. But I actually see different symptoms in check-in than I do in the oracle--a greater number are available in the check-in.

The oracle functions OK so I wouldn't want this to turn into a multi-hour fix. But if it's a quick fix it would be nice to see all the symptom options.

mpugach commented 7 years ago

The previous message was early submitted while I was switching into the code to verify it. So see updated one.

Are you logged in while asking oracle? I suppose most of the symptoms you are waiting for are owned by your user.

lmerriam commented 7 years ago

@mpugach correct, it looks like the ones I was seeing on my check-in are the symptoms I manually added. Sounds like it's working correctly.

lmerriam commented 7 years ago

@Graydyn given the current structure of this, do you think we'll be able to easily feed the user responses we get back into the algorithm to improve it? Is there anything we should do upfront to make sure that process is easy to do in the future?

Graydyn commented 7 years ago

Don't put to much work into it for now. As long as you can export the data into a CSV like you currently do, I can handle whatever formatting is necessary when the time comes.

lmerriam commented 7 years ago

👍

mpugach commented 7 years ago

deployed to staging