Observing Poor Performance on Our Independent Models

som-shahlab / ehr_ml

Code for doing machine learning with various EHRs

MIT License

21 stars 3 forks source link

Observing Poor Performance on Our Independent Models #27

Closed chethanjjj closed 2 years ago

chethanjjj commented 2 years ago

Hi Ethan,

My name is Chethan. I've created a version of your CLMBR pipeline, but am observing poor performance on my independent models (i.e. logistic regression and random forest). I’ve spent time figuring out why this is happening that includes…1) re-checking my coding, the paper, and repo to make sure the logic is sound, 2) exploring different hyper-parameter values for CLMBR, and 3) explored hyper-parameter tuning of the independent models. All three approaches have led to still poor performance. I'm wondering if you'd be available for a 1 hour meeting this or next week? I'd like to present my version and get your insights. Let me know your availability.

EthanSteinberg commented 2 years ago

Hi Chethan,

Sorry to hear that you have been getting poor performance! I'd be glad to talk to you to work out what is going on. Sorry for the late reply, I accidentally had email notifications for github turned off.

I'll be free from 9 AM - 6 PM on Wednesday through Friday next week.

What dataset are you working with? Could be an extraction issues?

chethanjjj commented 2 years ago

Not a problem, I understand that. how about this friday (12/17) from 11am-12pm? I can send a google meeting invite to your stanford email address. I'm using claims data that contains CCS, CPT, and HCPCS codes. I'd like to walk you through my implementation to get your thoughts. I agree is most likely how the data was extracted and even the duration of an encounter.

EthanSteinberg commented 2 years ago

11am - 12pm works for me. What timezone? (Sorry, should have been explicit from the start that I am on Pacific Time.)

I'm using claims data that contains CCS, CPT, and HCPCS codes.

That would probably explain things. My code doesn't support CCS codes so you would be losing a ton of your signal from the start.

chethanjjj commented 2 years ago

1) I'm in the Pacific Time, so no issue. I'll send an invite over.

2) I should have mentioned this from the beginning, the nature of our data prevented us from using your extraction process, so we created our own code to process the data to then get fed into our model. We believe a possible issue might be how we represent an encounter. We consider a day an encounter for each patient, but speaking with your professor we learned that the notion of a "day", or the encounter duration that you use matters a lot.

EthanSteinberg commented 2 years ago

Why don't you send me a copy of your code before we chat? That way I could take a look and be better prepared to ask the right questions when we talk?

chethanjjj commented 2 years ago

Sure sounds good. I'll email you our data preparation and modeling code. The former will require a bit of cleaning and commenting, so I'll send over the modeling code in the meantime.