onefact / network

Apache License 2.0
2 stars 0 forks source link

Programmatically Identify health system website #2

Open dr00b opened 4 months ago

dr00b commented 4 months ago

Starting from the enrollment data set, write logic to attribute individual hospitals to a public facing health system web domain.

Deliberately not saying "LLM", as we should start simple and add complexity as necessary.

dr00b commented 3 months ago

@jaanli corresponded with AHD, they maintain this mapping (along with any other vendor that's doing price transparency data loads). We should acquire one of those as a validation set.

Seems like a fairly tractable problem where we could maintain a definitive set and let the researchers focus on the actual content of the websites. Start at hospitals. If it works, take a whack at HHA, nursing homes, health plans (increasing complexity and LLC obfuscation).

"root domain" becomes an entity type in the network database.

There's lots of deets in there, sharing some hackmd.io stuff.