Closed chethanjjj closed 2 years ago
Hi Chethan,
Sorry to hear that you have been getting poor performance! I'd be glad to talk to you to work out what is going on. Sorry for the late reply, I accidentally had email notifications for github turned off.
I'll be free from 9 AM - 6 PM on Wednesday through Friday next week.
What dataset are you working with? Could be an extraction issues?
Not a problem, I understand that. how about this friday (12/17) from 11am-12pm? I can send a google meeting invite to your stanford email address. I'm using claims data that contains CCS, CPT, and HCPCS codes. I'd like to walk you through my implementation to get your thoughts. I agree is most likely how the data was extracted and even the duration of an encounter.
11am - 12pm works for me. What timezone? (Sorry, should have been explicit from the start that I am on Pacific Time.)
I'm using claims data that contains CCS, CPT, and HCPCS codes.
That would probably explain things. My code doesn't support CCS codes so you would be losing a ton of your signal from the start.
1) I'm in the Pacific Time, so no issue. I'll send an invite over.
2) I should have mentioned this from the beginning, the nature of our data prevented us from using your extraction process, so we created our own code to process the data to then get fed into our model. We believe a possible issue might be how we represent an encounter. We consider a day an encounter for each patient, but speaking with your professor we learned that the notion of a "day", or the encounter duration that you use matters a lot.
Why don't you send me a copy of your code before we chat? That way I could take a look and be better prepared to ask the right questions when we talk?
Sure sounds good. I'll email you our data preparation and modeling code. The former will require a bit of cleaning and commenting, so I'll send over the modeling code in the meantime.
Hi Ethan,
My name is Chethan. I've created a version of your CLMBR pipeline, but am observing poor performance on my independent models (i.e. logistic regression and random forest). I’ve spent time figuring out why this is happening that includes…1) re-checking my coding, the paper, and repo to make sure the logic is sound, 2) exploring different hyper-parameter values for CLMBR, and 3) explored hyper-parameter tuning of the independent models. All three approaches have led to still poor performance. I'm wondering if you'd be available for a 1 hour meeting this or next week? I'd like to present my version and get your insights. Let me know your availability.