usdigitalresponse / univaf

An API hosted by USDR for recording and querying vaccine appointment availability.
https://getmyvax.org/
Apache License 2.0
7 stars 2 forks source link

Address duplicates #158

Closed janovergoor closed 3 years ago

janovergoor commented 3 years ago

Things are looking much better after the external_id deduplication effort, but there are still a bunch of locations with the same address:

duplicates.csv

Some thoughts from a cursory visual inspection:

Mr0grog commented 3 years ago

some are different rows from the same location providing different vaccines (4169bc59-c8f3-492f-9d24-e88691e2cb01, a19aa4c9-cfcb-4ff5-96fc-042891623da6)

Ahhhhh these particular ones are bad data from #30. Thought I got those all in the de-duping. Will clean these out.

That said, there might be more here that are legit. I know a lot of PrepMod clinics are set up this way. Need to think through how to best handle these — if they are different in the source system, they may require different booking links, so figuring out a good way to merge them could be tough. This may just be a fundamental thing we have to deal with in the data. :\

Kroger accounts for a lot of the duplicates (little clinic seems to create multiple rows)

Looks like Kroger’s API that VaccineSpotter is hitting is actually returning these as separate places. Do you want to check in with Nick about that? I’m also working on getting access to Kroger’s new SMART SL API, and we can explore more directly there or see if it’s any different.

some might actually be different locations at the same address (Wallmart and Sam's club)

Looks like there might be a few different kinds of things going on with these. In some cases, it looks like we aren’t able to properly match the same Walmart from NJVSS vs. from VaccineSpotter. But some others look like a different issue. Will have to dig into this more.

Mr0grog commented 3 years ago

Ahhhhh these particular ones are bad data from #30.

Alright, I just cleaned these up. Still need to look into everything else. This removed 82 locations, which never had any availability, so I haven’t tried to merge them with anything. They’re just gone:

00315a18-6071-4be2-8647-be96e83d24a8
05e03d05-17e3-4d53-9666-ac375110593b
10d607a7-65b7-47c7-a268-6b6bc0c0c0fc
12bb767c-a1db-4ae2-b708-645688ff3048
138637d0-1b2d-4f79-af6e-e12d5761f9a9
15610117-4d7e-4ce9-8f54-aca7b2426955
15aac946-52a8-44ba-bdc1-4e77f4e27ee7
18f90988-2c30-4ca8-97ec-b6d39fa56f36
20ef861b-91d3-4593-8bfb-f03b537f3533
22484785-1f28-4c8c-8973-358babc4f0b3
23d23a08-5642-45aa-a7b6-e42baf100a12
24634bc5-f68c-4679-b9c7-e771f3ab3709
276be3c2-fabb-412d-96de-32d98b421167
287e636d-25f2-4b27-a601-df5e8719ce34
29168a88-ca2e-463e-8696-0a8660450e66
30405644-fb92-4dff-bf65-4019d8daf6d7
31711ea1-b731-4b6d-9eb3-ad46f384247c
350f6470-9228-4455-8794-2e4bae674edb
3723026a-2d78-440c-b2aa-28699cddf62b
382c39a9-aa95-43c9-99f7-aa27a8d787d7
3fc2d1dc-80b3-4c95-82d0-86abaacb9515
4169bc59-c8f3-492f-9d24-e88691e2cb01
416a34e7-3b20-44ef-8c88-b784b5b7bd01
4193d8b1-6599-45fc-916b-54326fbc83d0
42f2a71e-869c-494d-bc0c-284c761e01da
44ceb27d-d50a-4c5e-a4bc-98d0ebc20b5c
46485eae-4e2c-48fd-b49f-a6a954d3f1c2
46deb852-df8f-4679-9fd3-82540ab02115
4cc3d251-00e5-4ce3-a4c4-0465a4852ad6
4d398b3f-f5c6-4d8f-abec-dbc3af50d573
5098811b-3b4a-4376-9766-17644ae21896
5a8492cf-3673-4077-b895-1c65429fde0b
5d3c1d01-3b0a-46d3-9fe5-7315cc4bf655
5e2c8604-a4d3-4289-bcb8-0448ec0bf2f6
66b6b054-8d96-4b9f-ae6f-68b1ab86ceb3
6892b91d-94d6-400e-99b3-b33d848755c5
693de1a3-6613-4456-a7f8-1ef75f9f4574
6b64c112-2ea1-461d-bc4b-2ca6e88f319b
6cf9d95a-7f04-4d9a-b31d-07caabdf0812
6ea637d2-902a-440c-a509-867fd2611e16
71f7127a-b796-4e11-a742-f843763055d8
761b08ad-3925-4c7c-a077-f354015d5714
76a3c439-ecf6-4291-91af-b05f7a0cdc50
7914263e-6d78-41a3-b5b7-a9e8bd42520f
79225d23-3a3b-4a9e-8f05-b4e64f863784
7d30af21-270f-4f6e-8167-d4aad1af3975
7e640034-3b19-4a84-bb74-021f3bd3b2b0
840fea4d-9a83-49dc-ab63-c5e525f421e8
8613d564-ff34-4bc6-a62c-e7cc8c07e8f0
8b17c770-a057-4b7d-8521-8054a23685e0
8cc0a7d2-71c4-4f17-896b-ea3b22415a57
8d0d6569-0f9e-4697-8afd-ab1edfdc06c8
8ffe512f-13ea-4577-90eb-a84139e85e4b
94847a3b-244d-4cf1-a594-e2145c087d0b
9d2fae96-0849-470a-b6b5-4efcdde454e9
a1062a8e-3a2d-4a3b-b3fd-554c145ccd0e
a66255a8-3fbe-4f08-8453-0cc82bfaea8e
a6b0a348-a4e6-45ae-aa79-3beaee1cc200
a92bcc73-df15-412d-aff3-50ee1e53960d
a97fa353-85d7-4f1c-a6fe-c6332177aa76
ac8fa2d9-8ded-4c91-8634-3856f6bbe758
ad01db41-0de5-498c-850b-0ff5e4454b4a
aebfec35-1612-4d28-843f-ef44ef5a41c6
b0cf2ecf-389e-4904-8904-b4b5ea69009d
b164756f-8402-4eb9-9f18-7689a8d82107
be74def6-2f92-4a58-b5f1-f365f187a5f9
c074ea00-404f-423a-b4f0-91027a20e623
c0ccf0dc-0696-4981-9f81-025596dfce28
cb8f9a38-390a-4800-a51c-099afed8a5af
cef231a8-5cf0-484a-9ee1-627cca92fe35
cfd3ad57-2a47-418b-a04e-8a37abf19e33
d742f664-c71b-43e8-ac58-06168d3995d4
e191f749-ed77-407e-9a32-d541cfcc495b
e24724c4-96c5-47e1-a15f-974c0a7743a6
ec288586-dd88-4a5a-b2e0-0b80eee7b4c7
ee9aacab-3284-48e0-81af-43358dc0810e
f3192a2a-dfbe-4d0b-acb5-21a446b5117e
f42d052a-9134-4ad0-bf5f-5d3bb86c40f2
f8b5e519-6b7a-4037-9046-909e245e0af4
f8b64204-4c8a-4863-be64-bcb74a946bb8
fa7f54ed-21ba-4fbe-b6ca-d65e2bb2849a
fa93842e-f468-40f0-a72c-26538753ed41
Mr0grog commented 3 years ago

Kroger accounts for a lot of the duplicates (little clinic seems to create multiple rows)

Looks like Kroger’s API that VaccineSpotter is hitting is actually returning these as separate places. Do you want to check in with Nick about that? I’m also working on getting access to Kroger’s new SMART SL API, and we can explore more directly there or see if it’s any different.

OK, I now have access to Kroger’s SMART SL API, and it exhibits the same issue. I’ve asked Ryan Owens (I think he’s Kroger’s lead on the API) about it on FHIR chat: https://chat.fhir.org/#narrow/stream/281612-smart.2Fscheduling-links/topic/Publisher.3A.20Kroger/near/240625547

Mr0grog commented 3 years ago

Update from the Kroger rep: The Little Clinic and the pharmacies of bigger stores TLC might be embedded in are separate, and maintain separate inventory and booking. It’s correct that their availability is different even when at the same location.

janovergoor commented 3 years ago

thanks for the background!

On Tue, Jun 1, 2021 at 8:46 AM Rob Brackett @.***> wrote:

Update from the Kroger rep: The Little Clinic and the pharmacies of bigger stores TLC might be embedded in are separate, and maintain separate inventory and booking. It’s correct that their availability is different even when at the same location.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usdigitalresponse/appointment-availability-infra/issues/158#issuecomment-852232776, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGTHVNE6BRSZIWLRKNPHSDTQT6GXANCNFSM45TAQAXA .

Mr0grog commented 3 years ago

Another update here: a lot of the Walmarts and Sam’s Clubs at the same location are in fact correct. These are a lot like the King Soopers stores that contain The Little Clinic stores — in this case, a Walmart and a Sam’s Club are in the same physical facility, but they still have separate pharmacies with separate inventory and appointment availability. Booking at one is not the same as booking at the other.

janovergoor commented 3 years ago

We're getting close!

uuid                                 name                                provider                  address               city       state
bfb9dd90-6f24-476a-9914-eeb7908cd4c2 Carrollton Pharmacy                 walmart                   1025 W Trinity Mills  Carrollton TX   
fe032a64-99a7-4b32-b6e1-67ec92c0cdd5 Carrollton Pharmacy                 walmart                   1025 W Trinity Mills  Carrollton TX   
4598f50e-c684-43d6-91c5-c5a4f6002fba Dysart Community Center - El Mirage kroger_covid              14414 N El Mirage Rd  El Mirage  AZ   
b211b393-cab4-48e8-bf1a-62f7b5e0987e Dysart Community Center - El Mirage kroger_covid              14414 N El Mirage Rd  El Mirage  AZ   
46aa1d06-65a7-4060-9c48-cc45b4e6fe03 Park Ave Pharmacy                   njvss                     107 Park Ave          Paterson   NJ   
f1ddad77-2a1c-4203-ad2d-b6842e67d9d8 Park Ave Pharmacy                   health_mart               107 Park Ave          Paterson   NJ   
cc21bfb1-ae4b-4d37-90fe-2f094de9f918 Parmer And Mcneil H-E-B             heb                       6001 West Parmer Lane Austin     TX   
2c4f0841-baac-4fc0-a4d6-c285da4d3275 Parmer And Mcneil H-E-B             heb                       6001 West Parmer Lane Austin     TX   
cfd0e7d7-0e0d-415f-8b22-0d1ef3de68c7 Price Chopper Pharmacy 016          health_mart_price_chopper 1717 Black River Road Rome       NY   
c1fdce77-302e-442d-aabf-2314af0fa4a9 Price Chopper Pharmacy 016          price_chopper             1717 Black River Road Rome       NY 
Mr0grog commented 3 years ago

Whoa, that's short!

The Carrollton ones are so weird. They are different store numbers, and both show up as stores if you search for them on Walmart.com, but Google street view shows a big building marked as "WMS" that doesn't really look much like a retail storefront. Wonder if it's a warehouse or something. There's a definitely-real supercenter store just across the major road (different address & store number).

Mr0grog commented 3 years ago

So:

Mr0grog commented 3 years ago

Re: Carrollton, Lisabette F. at VtS ran this down and it appears to be where Walmart’s by mail pharmacy services are coordinated (see https://help.walmart.com/ci/fattach/get/7279100/0/filename/Prescription+Order+Form.pdf and https://www.walmart.com/cp/medication-therapy-management/4851429).

I’m going to mark it as private in the DB with a note so it doesn’t show up in the API.

janovergoor commented 3 years ago

Impressive sleuthing!

On Fri, Jun 11, 2021, 09:09 Rob Brackett @.***> wrote:

Re: Carrollton, Lisabette F. at VtS ran this down and it appears to be where Walmart’s by mail pharmacy services are coordinated (see https://help.walmart.com/ci/fattach/get/7279100/0/filename/Prescription+Order+Form.pdf and https://www.walmart.com/cp/medication-therapy-management/4851429).

I’m going to mark it as private in the DB with a note so it doesn’t show up in the API.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usdigitalresponse/appointment-availability-infra/issues/158#issuecomment-859686576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGTHVJ43TV2H3YQDDBJJW3TSIYKHANCNFSM45TAQAXA .

Mr0grog commented 3 years ago

Other updates:

Mr0grog commented 3 years ago

@janovergoor these should all be fixed in one form or another.

astonm commented 3 years ago

@janovergoor can we close this ticket?

janovergoor commented 3 years ago

sure