Closed rlmead closed 1 year ago
Notes on cleanup, remaining problems, and future considerations here
Queries used to find bad data (probably not comprehensive):
select name, address from venues where state != 'Kentucky';
select id, name, g_map_link from venues where g_map_link not like 'https://goo%' and g_map_link not like 'https://www.google.com%' and g_map_link not like 'https://g.page%';
select name, address from venues where city is null;
select name, address from venues where state is null;
Duplicate venue cleanup
Get list of duplicate venues and the number of events that links to each:
infinite-api=# select count(*) as num_events, venues.id, venues.name, venues.address from events
full outer join venues on events.venue_id = venues.id
where venues.id in ('7ba33ec9-ca8e-4700-97a5-d202acea7b0e', 'eb3b9bbc-e804-463d-84cb-1752f0dce43c', '2c227750-4871-4b5f-937e-8f848302f894', '030f6741-5e1c-4651-ad45-e0f1ac4f5e4b', '81bdf6b0-8511-11ea-813a-37e204d97a42', '4f914d60-b922-11e9-b43e-456ed9f9b823', 'e48b9d30-d6ae-11ea-b6c2-ad89bfbe302e', 'e592ac95-b63a-4439-a3ae-a282532ff448', '597ba75c-4832-41ea-9106-01f6cfd5b88c', '57045000-3200-11e9-b43e-456ed9f9b823', '2d74ad30-3475-11e9-b43e-456ed9f9b823', '5623b730-3475-11e9-b43e-456ed9f9b823', '8f5c7260-047a-11e8-908e-07b6d0fe598a', '550afaa5-7998-4b6c-a74e-005ac84ea74e')
group by venues.id order by name;
num_events | id | name | address
------------+--------------------------------------+-------------------------------------------------------------------+-------------------------------------------------
1 | 7ba33ec9-ca8e-4700-97a5-d202acea7b0e | Country Boy Brewing | 436 Chair Avenue, Lexington, Kentucky 40508
0 | eb3b9bbc-e804-463d-84cb-1752f0dce43c | Country Boy Brewing | 436 Chair Avenue, Lexington, 40508
2 | 57045000-3200-11e9-b43e-456ed9f9b823 | Lexington Public Library (Central Library) | 140 East Main Street, Lexington, Kentucky 40507
10 | 597ba75c-4832-41ea-9106-01f6cfd5b88c | Lexington Public Library (Central) | 140 East Main Street, Lexington, Kentucky 40507
1 | 2d74ad30-3475-11e9-b43e-456ed9f9b823 | Lexington Public Library - Tates Creek Branch | 3628 Walden Drive, Lexington, Kentucky 40517
1 | 5623b730-3475-11e9-b43e-456ed9f9b823 | Lexington Public Library - Tates Creek Branch | 3628 Walden Drive, Lexington, 40517,
1 | 2c227750-4871-4b5f-937e-8f848302f894 | Margaret I. King Library | 179 Funkhouser Drive, Lexington, Kentucky 40506
1 | 550afaa5-7998-4b6c-a74e-005ac84ea74e | Student Center Cinema | 160, Ave of Champions, Lexington 40508
11 | 8f5c7260-047a-11e8-908e-07b6d0fe598a | The Gatton Student Center Cinema | 160 Avenue of Champions, Lexington, Kentucky 40526
1 | 030f6741-5e1c-4651-ad45-e0f1ac4f5e4b | UK Special Collections Research Center - Margaret I. King Library | 179 Funkhouser Drive, Lexington, Kentucky 40506
1 | 81bdf6b0-8511-11ea-813a-37e204d97a42 | WRFL | Lexington, Lexington, Kentucky
2 | 4f914d60-b922-11e9-b43e-456ed9f9b823 | WRFL-LIVE | Lexington, Kentucky
4 | e48b9d30-d6ae-11ea-b6c2-ad89bfbe302e | WUKY | https://www.wuky.org/
1 | e592ac95-b63a-4439-a3ae-a282532ff448 | WUKY Studios | 2640 Spurr Road, Lexington, Kentucky 40511
(8 rows)
Checks & updates for soft-deleting duplicate venues:
// Country Boy (might have no events, but run just in case)
update events set venue_id = '7ba33ec9-ca8e-4700-97a5-d202acea7b0e' where venue_id = 'eb3b9bbc-e804-463d-84cb-1752f0dce43c';
// Central Library
update events set venue_id = '597ba75c-4832-41ea-9106-01f6cfd5b88c' where venue_id = '57045000-3200-11e9-b43e-456ed9f9b823';
// Tates Creek Library (might have no events)
update events set venue_id = '2d74ad30-3475-11e9-b43e-456ed9f9b823' where venue_id = '5623b730-3475-11e9-b43e-456ed9f9b823';
// Gatton Student Center Cinema (might have no events)
update events set venue_id = '8f5c7260-047a-11e8-908e-07b6d0fe598a' where venue_id = '550afaa5-7998-4b6c-a74e-005ac84ea74e';
// Margaret I. King Library
update events set venue_id = '2c227750-4871-4b5f-937e-8f848302f894' where venue_id = '030f6741-5e1c-4651-ad45-e0f1ac4f5e4b';
// WRFL
update events set venue_id = '81bdf6b0-8511-11ea-813a-37e204d97a42' where venue_id = '4f914d60-b922-11e9-b43e-456ed9f9b823';
// WUKY
update events set venue_id = 'e48b9d30-d6ae-11ea-b6c2-ad89bfbe302e' where venue_id = 'e592ac95-b63a-4439-a3ae-a282532ff448';
// double-check events before soft-deleting venues - should be 0
select count(*) from events where venue_id in ('eb3b9bbc-e804-463d-84cb-1752f0dce43c', '030f6741-5e1c-4651-ad45-e0f1ac4f5e4b', '4f914d60-b922-11e9-b43e-456ed9f9b823', 'e592ac95-b63a-4439-a3ae-a282532ff448', '57045000-3200-11e9-b43e-456ed9f9b823', '5623b730-3475-11e9-b43e-456ed9f9b823', '550afaa5-7998-4b6c-a74e-005ac84ea74e');
// soft-delete duplicate venues
update venues set is_soft_deleted = 't' where id in ('eb3b9bbc-e804-463d-84cb-1752f0dce43c', '030f6741-5e1c-4651-ad45-e0f1ac4f5e4b', '4f914d60-b922-11e9-b43e-456ed9f9b823', 'e592ac95-b63a-4439-a3ae-a282532ff448', '57045000-3200-11e9-b43e-456ed9f9b823', '5623b730-3475-11e9-b43e-456ed9f9b823', '550afaa5-7998-4b6c-a74e-005ac84ea74e');
// also soft-delete old North Lime Donuts location
update venues set is_soft_deleted = 't' where id = '31691000-c83d-11e9-b43e-456ed9f9b823';
Remaining tasks related to this work as of 3/19/23:
mode:online
tag (and change the wording in the submission interface to account for on-air events)After we have cleaned up and reworked venue location data, we can manually review location information in the database and/or at infinite.industries/admin-venue-edit. Based on what we know of the data, we might find problems like:
The following information is outdated but may still be helpful. There is also related information about bad Google Maps URLs in the archived issue #316.
You can find a list of venues that need corrected addresses in the
incomplete_addresses
sheet of this spreadsheet.The list in that spreadsheet corresponds to the output from this query as of 12/4/22:
The list returned here may not be comprehensive - could be worth doing some other data validation checks, too. We'll probably find more buggy data as we work through #318.