simonw / scotrail-datasette

ScotRail announcements in Datasette
https://scotrail.datasette.io
6 stars 0 forks source link

Fetch CSV from Google Sheets #1

Closed simonw closed 1 year ago

simonw commented 1 year ago

This repo needs a CSV file full of data that can then be built into a SQLite database.

The data is from a Google Sheet, crowdsourced from this Tweet: https://twitter.com/MattEason/status/1560653413783744512

If anyone needs this split into 2,440 individual mp3s - because why wouldn't you - I've put them here: https://drive.google.com/drive/folders/172W6sXnvlr7UcNLipO8BTw417_KRz9c5?usp=sharing

And if anyone wants to help transcribe all the files, here's a shared sheet, which has a good chance of descending into chaos:

https://docs.google.com/spreadsheets/d/1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM/edit#gid=2073317291

simonw commented 1 year ago

I'm going to use a GitHub Action workflow, manually triggered, that exports the data using:

https://docs.google.com/spreadsheets/d/1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM/export?format=csv

The document ID is:

1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM
simonw commented 1 year ago

Problem: that export includes a few blank columns at the end.

It looked like I would be able to get the range I want with:

https://docs.google.com/spreadsheets/d/1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM/gviz/tq?tqx=out:csv&range=A:G

But that doesn't work:

~ % curl -s 'https://docs.google.com/spreadsheets/d/1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM/gviz/tq?tqx=out:csv&range=A:G' | head -n 10
"File 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 0040 0041 0042 0043 0044 0045 0046 0047 0048 0049 0050 0051 0052 0053 0054 0055 0056 0057 0058 0059 0060 0061 0062 0063 0064 0065 0066 0067 0068 0069 0070 0071 0072 0073 0074 0075 0076 0077 0078 0079 0080 0081 0082 0083 0084 0085 0086 0087 0088 0089 0090 0091 0092 0093 0094 0095 0096 0097 0098 0099 0100 0101 0102 0103 0104 0105 0106 0107 0108 0109 0110 0111 0112 0113 0114 0115 0116 0117 0118 0119 0120 0121 0122 0123 0124 0125 0126 0127 0128 0129 0130 0131 0132 0133 0134 0135 0136 0137 0138 0139 0140 0141 0142 0143 0144 0145 0146 0147 0148 0149 0150 0151 0152 0153 0154 0155 0156 0157 0158 0159 0160 0161 0162 0163 0164 0165 0166 0167 0168 0169 0170 0171 0172 0173 0174 0175 0176 0177 0178 0179 0180 0181 0182 0183 0184 0185 0186 0187 0188 0189 0190 0191 0192 0193 0194 0195 0196 0197 0198 0199 0200 0201 0202 0203 0204 0205 0206 0207 0208 0209 0210 0211 0212 0213 0214 0215 0216 0217 0218 0219 0220 0221 0222 0223 0224 0225 0226 0227 0228 0229 0230 0231 0232 0233 0234 0235 0236 0237 0238 0239 0240 0241 0242 0243 0244 0245 0246 0247 0248 0249 0250 0251 0252 0253 0254 0255 0256 0257 0258 0259 0260 0261 0262 0263 0264 0265 0266 0267 0268 0269 0270 0271 0272 0273 0274 0275 0276 0277 0278 0279 0280 0281 0282 0283 0284 0285 0286 0287 0288 0289 0290 0291 0292 0293 0294 0295 0296","NRE ID WBQ BTH HEX PNR SOU STA SCS TRO GRH COR STP WML SOA BYL BEL BSU DFL DFE EXG GGJ GSC GRL HNW HKH HST LHA NEW NTN GLQ RAN KGE STF GLC GLQ NTN ADS ASB ADN BMP KRK KVD MEY LRH CTE ALO SFI EKB NEG GBG GAL TWB ","Transcription Platform  A trolley service Warrington Bank Quay South West Trains service to South West Trains service to Bath Spa Midland Main Line Midland Main Line service to  Midland Main Line service from [white noise] The next train at platform The next train at platform 1 is the The next train at platform 2 is the The next train at platform 3 is the The next train at platform 4 is the The next train at platform 5 is the The next train at platform 6 is the The next train at platform 7 is the The next train at platform 8 is the The next train at platform 1 is the The next train at platform 2 is the The next train at platform 3 is the The next train at platform 4 is the The next train at platform 5 is the The next train at platform 6 is the The next train at platform 7 is the The next train at platform 8 is the This train will call additionally at  will be in will be at I am sorry to announce that the ScotRail ScotRail service to ScotRail service from platform 5 for the platform 6 for the Strathclyde metro service from  platform 1 for the platform 2 for the platform 3 for the platform 4 for the platform 5 for the platform 6 for the platform 7 for the platform 8 for the The public address at this station is currently under test Please ignore the following announcements Thank you The display information system at this station is currently under test Please ignore any information currently displayed The delayed Midland Main Line will now depart from platform 1 will now depart from platform 2 will now depart from platform 3 will now depart from platform 4 will now depart from platform 5 will now depart from platform 6 will now depart from platform 7 will now depart from platform 8 will now depart from This train is formed of 11 coaches This train is formed of 12 coaches This train is formed of 1 coach only This train is formed of 1 coach delayed The train now approaching platform 1 The train now approaching platform 2 The train now approaching platform 3 The train now approaching platform 4 The train now approaching platform 5 The train now approaching platform 6 The train now approaching platform 7 The train now approaching platform 8 The train now approaching platform 2 is the The train now approaching platform 3 is the The train now approaching platform 4 is the The train now approaching platform 5 is the The train now approaching platform 6 is the The train now approaching platform 7 is the The train now approaching platform 8 is the Hexham of drinks and light refreshments We are sorry to announce that the platform 1 for the delayed platform 2 for the delayed platform 3 for the delayed platform 4 for the delayed platform 5 for the delayed platform 6 for the delayed platform 7 for the delayed platform 8 for the delayed The next service is to ScotRail services to ScotRail service from Will be in Will be in 5 minutes Will be in 10 minutes Will be in 15 minutes Will be in 20 minutes Will be in 25 minutes Will be in 30 minutes Will be in 35 minutes Will be in 40 minutes Will be in 45 minutes Will be in 50 minutes Will be in 55 minutes Will be in 1 hour Is delayed by approximately 5 minutes Is delayed by approximately 10 minutes Is delayed by approximately 15 minutes Is delayed by approximately 20 minutes Is delayed by approximately 25 minutes Is delayed by approximately 30 minutes Is delayed by approximately 35 minutes Is delayed by approximately 40 minutes Is delayed by approximately 45 minutes Is delayed by approximately 50 minutes Is delayed by approximately 55 minutes Is delayed by approximately 1 hour Strathclyde metro service to Strathclyde metro service from  Should join the rear 9 coaches Should join the rear 10 coaches Should join the rear 11 coaches Should join the rear 12 coaches This service will be in approximately 5 minutes This service will be in approximately 10 minutes This service will arrive in approximately 7 minutes This service will arrive in approximately 12 minutes This service will be in approximately 5 minutes 10 minutes 7 minutes 12 minutes Will arrive in approximately 1 minute Will arrive in approximately 2 minutes Will arrive in approximately 3 minutes Will arrive in approximately 4 minutes Will arrive in approximately 5 minutes Will arrive in approximately 10 minutes Will arrive in approximately 15 minutes Will arrive in approximately 20 minutes Will arrive in approximately 25 minutes Will arrive in approximately 30 minutes Will arrive in approximately 35 minutes Will arrive in approximately 40 minutes Will arrive in approximately 45 minutes Will arrive in approximately 50 minutes Will arrive in approximately 55 minutes Poor rail conditions Poor rail conditions Penrith Attention please The public address system at this station Is at present being tested by our engineers Please disregard any public address announcements While this testing is carried out Thank you Attention please, here is a special announcement, penalty fares are in operation in this area, all passengers must be in possession of a valid ticket or authority to travel before travelling on services from this station. If you are asked to produce a ticket, and are unable to do so, you will be liable to pay a penalty fare, or the full standard single fare, whichever is the greater Will be in approximately 1 minute Will be in approximately 2 minutes Will be in approximately 3 minutes Will be in approximately 4 minutes Will be in approximately 5 minutes Will be in approximately 10 minutes Will be in approximately 15 minutes Will be in approximately 20 minutes Will be in approximately 25 minutes Will be in approximately 30 minutes Will be in approximately 35 minutes Will be in approximately 40 minutes Will be in approximately 45 minutes Will be in approximately 50 minutes Will be in approximately 55 minutes This train is formed of 1 coach only This train is formed of 2 coaches This train is formed of 3 coaches This train is formed of 4 coaches This train is formed of 5 coaches This train is formed of 6 coaches This train is formed of 7 coaches This train is formed of 8 coaches This train is formed of 9 coaches This train is formed of 10 coaches This train is formed of 11 coaches This train is formed of 12 coaches hundred hours Stand well away from the edge of platform 1 Stand well away from the edge of platform 2 Stand well away from the edge of platform 3 Stand well away from the edge of platform 4 Stand well away from the edge of platform 5 Stand well away from the edge of platform 6 Stand well away from the edge of platform 7 Stand well away from the edge of platform 8 Southern Southern service to Southern service from Southeastern Trains Southeastern Trains Southeastern Trains service to Southeastern Trains service from Southeastern Southeastern service to Southeastern service from Attention please! Please leave the station immediately. Attention please! Please leave the station immediately. Attention please! Please leave the station immediately. Attention please! Please leave the station immediately. Southampton Central Southern Southeastern First Great Western First Great Western service to First Great Western service from Stafford Starcross is available Trowbridge Gartcosh Bodmin Corby St Pancras International Wilmslow Should change here Southampton Airport Parkway This is a platform alteration Midnight 01 02 03 04 05 06 07 08 09 10 Barry Links Beauly Brunstane Central Dunfermline Queen Margaret  Dunfermline Town  Dunoon 11 Exhibition Centre Georgemas Gilshochill Greenfaulds Hamilton West Hawkhead High Street Loch Awe Newcraighall Newton 12 Queen Street Rannoch Kingsknowe Stromeferry Glasgow Central Low Level Glasgow Queen Street Low Level Newton (Lanarkshire) Ardrossan Harbour Ardrossan South Beach Ardrossan Town 13 Brampton (Cumbria) Cumbrae Slip Kirkconnel 14 15 16 17 18 East Midlands Airport Kelvindale Merryton Larkhall Chatelherault Alloa 19 Shawfair Eskbank Newtongrange Gorebridge Galashiels 20 Tweedbank Edinburgh Airport Edinburgh International Airport 21 Abbey Road","Category Platform ID Passenger information Destination Train operating company Train operating company Destination Train operating company Train operating company Train operating company Non-vocal Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Passenger information Conjoining Conjoining Apology Train operating company Train operating company  Train operating company Platform information Platform information Strathclyde metro Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Operational Operational Conjoining Operational Operational Operational Conjoining Passenger information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Passenger information Passenger information Passenger information Passenger information Passenger information Conjoining Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Destination Passenger information Apology Platform information Platform information Platform information Platform information Platform information Platform information Platform information Platform information Passenger information Train operating company Train operating company Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Strathclyde metro Strathclyde metro Passenger information Passenger information Passenger information Passenger information Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Reason Reason Destination Conjoining Operational Operational Operational Operational Conjoining Fare information Time Time Time Time Time Time Time Time Time Time Time Time Time Time Time Train formation Train formation Train formation Train formation Train formation Train formation Train formation Train formation Train formation Train formation Train formation Train formation Time Safety Safety Safety Safety Safety Safety Safety Safety Train operating company Train operating company Train operating company Train operating company Train operating company Train operating company Train operating company Train operating company Train operating company Train operating company Safety Safety Safety Safety Destination Train operating company Train operating company Train operating company Train operating company Train operating company Destination Destination Conjoining Destination Destination Destination Destination Destination Destination Conjoining Destination Platform information Time Time Time Time Time Time Time Time Time Time Time Destination Destination Destination Destination Destination Destination Destination Time Destination Destination Destination Destination Destination Destination Destination Destination Destination Destination Time Destination Destination Destination Destination Destination Destination Destination Destination Destination Destination Time Destination Destination Destination Time Time Time Time Time Destination Destination Destination Destination Destination Destination Time Destination Destination Destination Destination Destination Time Destination Destination Destination Time Destination","Notes Bit of a detour Stopped existing in 2007! non-vocal ""the"" (/ðə/) ""the"" (/ðə/) ""the"" (/ðə/) ""the"" (/ðə/) ""the"" (/ðə/) ""the"" (/ðə/) ""the"" (/ðə/) ""the"" (/ðə/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ðə/) ""the"" (/ðə/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ðə/) ""the"" (/ðə/) Low-pitch ending Nice ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) ""the"" (/ði/) Most frequently used file really long if you hear this the train is very lost :( Low-pitch ending Male voice (quieter) Male voice Male voice Male voice (louder) Low-pitch ending Low-pitch ending oh-one oh-two oh-three oh-four oh-five oh-six oh-seven oh-eight oh-nine Ferry Glasgow mid-sentence Ferry Non-rail Non-rail Non-rail DLR","Timestamp 0:00:00 0:00:02 0:00:05 0:00:07 0:00:10 0:00:14 0:00:16 0:00:19 0:00:22 0:00:26 0:00:32 ","mp3 file 0001.mp3 0002.mp3 0003.mp3 0004.mp3 0005.mp3 0006.mp3 0007.mp3 0008.mp3 0009.mp3 0010.mp3 0011.mp3 0012.mp3 0013.mp3 0014.mp3 0015.mp3 0016.mp3 0017.mp3 0018.mp3 0019.mp3 0020.mp3 0021.mp3 0022.mp3 0023.mp3 0024.mp3 0025.mp3 0026.mp3 0027.mp3 0028.mp3 0029.mp3 0030.mp3 0031.mp3 0032.mp3 0033.mp3 0034.mp3 0035.mp3 0036.mp3 0037.mp3 0038.mp3 0039.mp3 0040.mp3 0041.mp3 0042.mp3 0043.mp3 0044.mp3 0045.mp3 0046.mp3 0047.mp3 0048.mp3 0049.mp3 0050.mp3 0051.mp3 0052.mp3 0053.mp3 0054.mp3 0055.mp3 0056.mp3 0057.mp3 0058.mp3 0059.mp3 0060.mp3 0061.mp3 0062.mp3 0063.mp3 0064.mp3 0065.mp3 0066.mp3 0067.mp3 0068.mp3 0069.mp3 0070.mp3 0071.mp3 0072.mp3 0073.mp3 0074.mp3 0075.mp3 0076.mp3 0077.mp3 0078.mp3 0079.mp3 0080.mp3 0081.mp3 0082.mp3 0083.mp3 0084.mp3 0085.mp3 0086.mp3 0087.mp3 0088.mp3 0089.mp3 0090.mp3 0091.mp3 0092.mp3 0093.mp3 0094.mp3 0095.mp3 0096.mp3 0097.mp3 0098.mp3 0099.mp3 0100.mp3 0101.mp3 0102.mp3 0103.mp3 0104.mp3 0105.mp3 0106.mp3 0107.mp3 0108.mp3 0109.mp3 0110.mp3 0111.mp3 0112.mp3 0113.mp3 0114.mp3 0115.mp3 0116.mp3 0117.mp3 0118.mp3 0119.mp3 0120.mp3 0121.mp3 0122.mp3 0123.mp3 0124.mp3 0125.mp3 0126.mp3 0127.mp3 0128.mp3 0129.mp3 0130.mp3 0131.mp3 0132.mp3 0133.mp3 0134.mp3 0135.mp3 0136.mp3 0137.mp3 0138.mp3 0139.mp3 0140.mp3 0141.mp3 0142.mp3 0143.mp3 0144.mp3 0145.mp3 0146.mp3 0147.mp3 0148.mp3 0149.mp3 0150.mp3 0151.mp3 0152.mp3 0153.mp3 0154.mp3 0155.mp3 0156.mp3 0157.mp3 0158.mp3 0159.mp3 0160.mp3 0161.mp3 0162.mp3 0163.mp3 0164.mp3 0165.mp3 0166.mp3 0167.mp3 0168.mp3 0169.mp3 0170.mp3 0171.mp3 0172.mp3 0173.mp3 0174.mp3 0175.mp3 0176.mp3 0177.mp3 0178.mp3 0179.mp3 0180.mp3 0181.mp3 0182.mp3 0183.mp3 0184.mp3 0185.mp3 0186.mp3 0187.mp3 0188.mp3 0189.mp3 0190.mp3 0191.mp3 0192.mp3 0193.mp3 0194.mp3 0195.mp3 0196.mp3 0197.mp3 0198.mp3 0199.mp3 0200.mp3 0201.mp3 0202.mp3 0203.mp3 0204.mp3 0205.mp3 0206.mp3 0207.mp3 0208.mp3 0209.mp3 0210.mp3 0211.mp3 0212.mp3 0213.mp3 0214.mp3 0215.mp3 0216.mp3 0217.mp3 0218.mp3 0219.mp3 0220.mp3 0221.mp3 0222.mp3 0223.mp3 0224.mp3 0225.mp3 0226.mp3 0227.mp3 0228.mp3 0229.mp3 0230.mp3 0231.mp3 0232.mp3 0233.mp3 0234.mp3 0235.mp3 0236.mp3 0237.mp3 0238.mp3 0239.mp3 0240.mp3 0241.mp3 0242.mp3 0243.mp3 0244.mp3 0245.mp3 0246.mp3 0247.mp3 0248.mp3 0249.mp3 0250.mp3 0251.mp3 0252.mp3 0253.mp3 0254.mp3 0255.mp3 0256.mp3 0257.mp3 0258.mp3 0259.mp3 0260.mp3 0261.mp3 0262.mp3 0263.mp3 0264.mp3 0265.mp3 0266.mp3 0267.mp3 0268.mp3 0269.mp3 0270.mp3 0271.mp3 0272.mp3 0273.mp3 0274.mp3 0275.mp3 0276.mp3 0277.mp3 0278.mp3 0279.mp3 0280.mp3 0281.mp3 0282.mp3 0283.mp3 0284.mp3 0285.mp3 0286.mp3 0287.mp3 0288.mp3 0289.mp3 0290.mp3 0291.mp3 0292.mp3 0293.mp3 0294.mp3 0295.mp3 0296.mp3"
"0297","","22","Time","","","0297.mp3"
"0298","","23","Time","","","0298.mp3"
"0299","","clock","Time","","","0299.mp3"
"0300","","01","Time","oh-one","","0300.mp3"
"0301","","02","Time","oh-two","","0301.mp3"
"0302","","03","Time","oh-three","","0302.mp3"
"0303","","Edinburgh International","Destination","Proposed","","0303.mp3"
"0304","LPY","Liverpool South Parkway","Destination","","","0304.mp3"
"0305","","Liverpool South","Destination","","","0305.mp3" 
simonw commented 1 year ago

I'm just going to hit https://docs.google.com/spreadsheets/d/1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM/export?format=csv and then clean it up using sqlite-utils.

simonw commented 1 year ago
/tmp % curl -s -L 'https://docs.google.com/spreadsheets/d/1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM/export?format=csv' | sqlite-utils memory stdin:csv --schema 
CREATE TABLE "stdin" (
   [File] INTEGER,
   [NRE ID] TEXT,
   [Transcription] TEXT,
   [Category] TEXT,
   [Notes] TEXT,
   [Timestamp] TEXT,
   [mp3 file] TEXT,
   [] TEXT
);
CREATE VIEW t1 AS select * from [stdin];
CREATE VIEW t AS select * from [stdin];

So this does what I want:

curl -s -L 'https://docs.google.com/spreadsheets/d/1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM/export?format=csv' | \
  sqlite-utils memory stdin:csv \
  'select [File], [NRE ID], Transcription, Category, Notes, Timestamp, [mp3 file] from stdin' \
  --csv > output.csv
simonw commented 1 year ago

... except it truncates a File of 0001 to 1 which I don't want.

Adding --no-detect-types fixes that.

curl -s -L 'https://docs.google.com/spreadsheets/d/1jAtNLBXLYwTraaC_IGAAs53jJWWEQUtFrocS5jW31JM/export?format=csv' | \
  sqlite-utils memory stdin:csv \
  'select [File], [NRE ID], Transcription, Category, Notes, Timestamp, [mp3 file] from stdin' \
  --no-detect-types \
  --csv > announcements.csv