simonprickett / nctx-stop-api

Nottingham City Transport Bus Stop Departures mini API with Cloudflare Workers
https://simonprickett.dev
MIT License
7 stars 1 forks source link
cloudflare cloudflare-workers javascript transit-agency transit-api

Nottingham City Transport Bus Departures API

Nottingham City Transport Bus at Forest Recreation Ground

Overview

My local bus company Nottingham City Transport (NCTX) doesn't have an API for real time bus departures, and I couldn't find any other source of this data so I decided to make my own using Cloudflare Workers and a screen scraping approach.

If you build a front end or interface to this, I'd love to see it. You can get hold of me here. I also wrote about this project on my website.

Running Locally

To run this locally, you'll need:

First, get the code:

$ git clone https://github.com/simonprickett/nctx-stop-api.git
$ cd nctx-stop-api

Next, install Wrangler globally:

$ npm install -g wrangler

Install the project dependencies:

$ npm install

Now, you're ready to start a local copy of the worker:

 wrangler dev
 ⛅️ wrangler 2.13.0
--------------------
⬣ Listening at http://0.0.0.0:8787
- http://127.0.0.1:8787
- http://192.168.4.22:8787

The first time that you do this, you'll be prompted to login to Cloudflare and authorise Wrangler. Follow the on screen instructions and prompts.

Test the worker locally by visiting:

http://localhost:8787/?stopId=3390FO07

Deploying to Cloudflare Workers

When you're ready to publish the worker to the world and give it a public URL that's part of your Cloudflare account, use Wrangler:

$ wrangler publish
 ⛅️ wrangler 2.13.0
--------------------
Total Upload: 8.08 KiB / gzip: 2.13 KiB
Uploaded nctx (2.56 sec)
Published nctx (1.63 sec)
  https://nctx.<your cloudflare workers domain>.workers.dev

Once deployed, your worker will be accessible on the internet at the URL that Wrangler outputs at the end of the publishing process. Note that this is a https URL - Cloudflare takes care of SSL for you.

Usage

Obtaining a Bus Stop ID

This API works at the bus stop level, there's no endpoints to get a list of routes or stops. To make it work you'll need a bus stop ID. You can get one of these from the Nottingham City Transport website like so:

Requesting Departure Data for a Bus Stop

The following examples all use stop ID 3390FO07 ("Forest Recreation Ground"), and route numbers and line colours that pass through that stop.

All examples are GET requests, so you can just use a browser to try them out. You could also use Postman. These examples assume you're running the worker code locally, just swap the URL to your production one if you've deployed it and want to run it in production.

Get All Departures

To get all the departures for a given stop ID go to the following URL:

http://localhost:8787/?stopId=3390FO07

This returns a JSON response that looks like this:

{
  "stopId": "3390FO07",
  "stopName": "Forest Recreation Ground",
  "departures": [
    {
      "lineColour": "#FED100",
      "line": "yellow",
      "routeNumber": "70",
      "destination": "City, Victoria Centre T3",
      "expected": "2 mins",
      "expectedMins": 2,
      "isRealTime": true
    },
    {
      "lineColour": "#935E3A",
      "line": "brown",
      "routeNumber": "16",
      "destination": "City, Victoria Centre T2",
      "expected": "3 mins",
      "expectedMins": 3,
      "isRealTime": true
    },
    {
      "lineColour": "#522398",
      "line": "purple",
      "routeNumber": "88",
      "destination": "City, Parliament St P4",
      "expected": "5 mins",
      "expectedMins": 5,
      "isRealTime": true
    }
  ]
}

The stopId field contains the ID of the stop that you provided. stopName contains the full name for that stop. The remainder of the response is contained in the departures array. Each departure has the following data fields:

Filtering / Limiting Data Returned

There are various ways in which you can filter and limit the data returned. These are all specified using extra parameters on the request, and can be combined together in a single request.

Use the filters by adding additional request parameters:

Example showing how to combine these... let's get up to 4 yellow line departures in the next 60 mins:

http://localhost:8787/?stopId=3390FO07&line=yellow&maxWaitTime=60&maxResults=4

The order of the arguments doesn't matter.

Specifying the Format for Data Returned

The worker can return data in two different formats...

JSON Responses

JSON is the default response format, which is described earlier in this document. There's no need to do this but you can set the format request parameter to json if you like:

http://localhost:8787/?stopId=3390FO07&maxResults=3&format=json

The response looks like this:

{
  "stopId": "3390FO07",
  "stopName": "Forest Recreation Ground",
  "departures": [
    {
      "lineColour": "#FED100",
      "line": "yellow",
      "routeNumber": "69",
      "destination": "City, Victoria Centre T4",
      "expected": "1 min",
      "expectedMins": 1,
      "isRealTime": true
    },
    {
      "lineColour": "#935E3A",
      "line": "brown",
      "routeNumber": "15",
      "destination": "City, Victoria Centre T2",
      "expected": "2 mins",
      "expectedMins": 2,
      "isRealTime": true
    },
    {
      "lineColour": "#FED100",
      "line": "yellow",
      "routeNumber": "68",
      "destination": "City, Victoria Centre T4",
      "expected": "4 mins",
      "expectedMins": 4,
      "isRealTime": true
    }
  ]
}

If you opt to use the fields request parameter, only the fields you ask for will be returned:

http://localhost:8787/?stopId=3390FO07&format=json&maxResults=3&fields=line,routeNumber,expected

returns:

{
  "stopId": "3390FO07",
  "stopName": "Forest Recreation Ground",
  "departures": [
    {
      "line": "yellow",
      "routeNumber": "69",
      "expected": "1 min"
    },
    {
      "line": "brown",
      "routeNumber": "15",
      "expected": "2 mins"
    },
    {
      "line": "yellow",
      "routeNumber": "68",
      "expected": "5 mins"
    }
  ]
}

Delimited String Responses

The worker can also return delimited string responses. You might want to use these when processing the response on a device with limited capabilities, where a JSON parser might not be viable. To get a string response set the format request parameter to string:

http://localhost:8787/?stopId=3390FO07&format=string&maxResults=3

The response format looks like this:

3390FO07|Forest Recreation Ground|#FED100^yellow^68^City, Victoria Centre T4^1 min^1^true|^#92D400^lime^56^City, Parliament St P2^4 mins^4^true|^#522398^purple^89^City, Parliament St P5^5 mins^5^true

The following fields are returned, separated by | characters:

Within each departure, fields are separated by ^ characters. If you choose to filter which fields are returned using the fields request parameter, those fields will be omitted without returning a blank value. For example:

http://localhost:8787/?stopId=3390FO07&format=string&maxResults=3&fields=line,routeNumber,expected

Returns:

3390FO07|Forest Recreation Ground|yellow^69^1 min|^brown^15^2 mins|^yellow^68^4 mins

How Does It Work?

Overview

This project is implemented as a Cloudflare Worker, code that runs and scales in a serverless execution environment across the Cloudflare network. Workers can be written in a few different languages, I chose JavaScript. All of the code lives in a single file, index.js.

Workers generally consist of an event listener and an event handler (see docs). The event listener listens for fetch events (such an event occurs when someone requests the URL that the worker is deployed at). It then calls the event handler whose job is to take the Request object for this call (see docs) and build an appropriate Response object (docs here) then return it to the client.

All of the code to query the NCTX website, gather the bus departure data, filter and return it in the requested format happens in the handleRequest function.

Getting the Page from the NCTX Website

The first thing that the code has to do is check that a stop ID was provided. It does this by looking for a URL parameter named stopId and responding with a bad request error if one isn't provided, or the request type was anything other than a GET:

const url = new URL(request.url)
const stopId = url.searchParams.get('stopId')

if (request.method !== 'GET' || !stopId) {
  return new Response(BAD_REQUEST_TEXT, {
    status: BAD_REQUEST_CODE,
    headers: CORS_HEADERS,
  })
}

If a stop ID was provided, we'll get the source HTML for that stop's page from NCTX:

const stopUrl = `https://nctx.co.uk/stops/${stopId}`
const stopPage = await fetch(stopUrl)

You can check out what a stop page looks like here, which is the page for stop "3390FO07" (Forest Recreation Ground).

Parsing Data from the Page Source and Storing It

The HTML page source has been fetched into a variable called stopPage, what we need to do now is parse through it and find the data for each departure from the stop. Cloudflare provides a HTML Rewriter as part of the Workers API - it parses the HTML for us, firing listener functions whenever selector expressions that we are looking for are found.

From inspecting the HTML page source from NCTX, we can determine which selectors will match for each element containing a data item that we're interested in. For example, here let's find where a bus that's due to pass by the stop is headed to, which is contained in a paragraph with a CSS class names single-visit__description:

const htmlRewriter = await new HTMLRewriter()
  .on('p.single-visit__description', {
    text(text) {
      if (text.text.length > 0) {
        currentDeparture.destination = text.text.trim()
      }
    },
  })
  // functions for other matches...
  .transform(stopPage) // run the rewriter
  .text()

When a match for such a paragraph tag is found, we provide a handler for text chunks and store the text found, trimming any whitespace from it.

The code contains several functions that fire when different selectors are found. These each get a single piece of data about a bus departure and store it in an object named currentDeparture.

The last data item found for each departure is either the real time estimate of when the bus will arrive at the stop, or a timetable estimate for buses that don't have real time tracking, or which haven't started on the journey yet. When one of these items is found, the code pushes the currentDeparture object into an array named departures, and starts again with the next departure. In this way, we build up an array of objects describing upcoming departures from the stop.

Data Cleanup / Formatting

Each of the functions that run when a selector match is found have to do some leel of cleanup or formatting on the data to make it more useful in an API response. The most common change is to trim whitespace off the start and end of strings, which is generally done like this:

const trimmedText = text.text.trim()

Where text is a text chunk returned by the HTML rewriter and text.text is the string value found.

Some data is checked against lookup objects to get the value that goes into the API response. For example, there's no line name in the HTML, but we can work it out based on an HTML colour code in the source:

// Maps line colour codes to line names.
const LINE_NAME_LOOKUP = {
  '#935E3A': 'brown',
  '#007A4D': 'green',
  '#CD202C': 'red',
  '#DA487E': 'pink',
  '#3FCFD5': 'turquoise',
  '#E37222': 'orange',
  '#6AADE4': 'skyblue',
  '#C1AFE5': 'lilac',
  '#FED100': 'yellow',
  '#522398': 'purple',
  '#002663': 'navy',
  '#B5B6B3': 'grey',
  '#00A1DE': 'blue',
  '#92D400': 'lime',
}
...

.on('div.single-visit__highlight', {
  element(elem) {
    // Pull this out of the style attribute whose value looks like: background-color:#92D400;
    const styleAttr = elem.getAttribute('style')
    const routeColour = styleAttr.substring(
      'background-color:'.length,
      styleAttr.length - 1,
    )
    currentDeparture.lineColour = routeColour
    currentDeparture.line = LINE_NAME_LOOKUP[routeColour]
  },
})

Another data item that requires noteworthy formatting is the number of minutes until the bus is due to arrive at the stop. In the source HTML, this can have a number of formats. For buses with live tracking:

These scenarios are handled here:

.on('div.single-visit__time--expected', {
  // Bus has live tracking, value will be "Due" or a number of minutes e.g. "2 mins".
  text(text) {
    if (text.text.length > 0) {
      const trimmedText = text.text.trim()
      currentDeparture.expected = trimmedText

      // When due, the bus is expected in 0 minutes.
      if (trimmedText.toLowerCase() === 'due') {
        currentDeparture.expectedMins = 0
      } else {
        // Parse out the number of minutes.
        currentDeparture.expectedMins = parseInt(
          trimmedText.split(' ')[0],
          10,
        )
      }

      currentDeparture.isRealTime = true

      departures.push(currentDeparture)
      currentDeparture = {}
    }
  },
})

For buses without live tracking, we also have to deal with times in 24hr format:

// Used when getting the current UK time... see
// https://stackoverflow.com/questions/25050034/get-iso-8601-using-intl-datetimeformat
const INTL_DATE_TIME_FORMAT_OPTIONS = {
  timeZone: 'Europe/London',
  year: 'numeric',
  month: '2-digit',
  day: '2-digit',
  hour: '2-digit',
  minute: '2-digit',
  second: '2-digit',
  hour12: false,
  timeZoneName: 'short',
}

// Use a locale that has adopted ISO 8601 as there is no locale for
// that directly so using Sweden here...
const INTL_DATE_TIME_FORMAT_LOCALE = 'sv-SE'

.on('div.single-visit__time--aimed', {
  // Bus does not have live tracking, value will be "Due" or a clock time e.g. "22:30"
  // Sometimes though it's a number of minutes e.g. "59 mins".
  text(text) {
    if (text.text.length > 0) {
      const trimmedText = text.text.trim()
      currentDeparture.expected = trimmedText

      // When due, the bus is expected in 0 minutes.
      if (trimmedText.toLowerCase() === 'due') {
        currentDeparture.expectedMins = 0
      } else {
        // Calculate number of minutes in the future that the value of trimmedText
        // represents (value is a clock time e.g. 22:30) and store in expectedMins.
        // careful too as 00:10 could be today or tomorrow...

        if (trimmedText.indexOf(':') !== -1) {
          // This time is in the "hh:mm" 24hr format.
          const ukNow = new Date(new Intl.DateTimeFormat(INTL_DATE_TIME_FORMAT_LOCALE, INTL_DATE_TIME_FORMAT_OPTIONS).format(new Date()))
          const departureDate = new Date(new Intl.DateTimeFormat(INTL_DATE_TIME_FORMAT_LOCALE, INTL_DATE_TIME_FORMAT_OPTIONS).format(new Date()))

          // Zero these out for better comparisons at the minute level.
          ukNow.setSeconds(0)
          ukNow.setMilliseconds(0)
          departureDate.setSeconds(0)
          departureDate.setMilliseconds(0)

          const [ departureHours, departureMins ] = trimmedText.split(':')
          const departureHoursInt = parseInt(departureHours, 10)
          const departureMinsInt = parseInt(departureMins, 10)

          departureDate.setHours(departureHoursInt)
          departureDate.setMinutes(departureMinsInt)

          if (ukNow.getHours() > departureHoursInt) {
            // The departure is tomorrow e.g. it's now 23:00 and the departure is 00:20.
            departureDate.setDate(departureDate.getDate() + 1)
          }

          const millis = departureDate - ukNow
          const minsToDeparture = (millis/1000)/60

          currentDeparture.expectedMins = minsToDeparture
        } else {
          // This time is in the "59 mins" format.
          currentDeparture.expectedMins = parseInt(
            trimmedText.split(' ')[0],
            10
          )
        }
      }

      currentDeparture.isRealTime = false
      departures.push(currentDeparture)
      currentDeparture = {}
    }
  },
})

Filtering the Response

As we saw earlier in this document, there's several ways that the response can be filtered using request parameters. For example, we may only need to return buses operating on a given line colour or only the first 5 results. As we saw, these filter parameters can be added together so we need to be sure to apply each one that was specified on the request before returning our response.

These filters are implemented as a series of code blocks, each of which checks for the presence of a request parameter then removes departure objects from the departures array that don't match the filter criteria.

The route number filter is an interesting example, as some routes have different variants that are still the same route number, but may not travel the entire length of the route or stop at all of the stops. These variants end in a letter - X for example often indicating "express". I decided that, for example, filtering for route 69 should also return route 69A, 69C, 69X so had to implement some logic for that as follows:

const NUMBER_CHARS_LOOKUP = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

// Filter by route if needed... route 69 includes 69A, 69X etc but not 169 or 690.
const routeToFilter = url.searchParams.get('routeNumber')
if (routeToFilter) {
  results.departures = results.departures.filter(departure => {
    const lastChar = departure.routeNumber.substring(
      departure.routeNumber.length - 1,
    )

    // Route number either needs to match exactly, or start with the provded route number and
    // not end in a number... so if we're looking for route 58 this should return route 58,
    // 58A, 58X but not 590.  This also allows us to be more specific and look for 58X.
    // You could probably use a regular expression here but I find they introduce more issues
    // than they solve, so I avoid them :)
    return (
      departure.routeNumber === routeToFilter ||
      (departure.routeNumber.startsWith(routeToFilter) &&
        !NUMBER_CHARS_LOOKUP.includes(lastChar))
    )
  })
}

The other filters follow similar patterns - use the JavaScript array filter function to run logic against each members of departures to determine whether to keep it or not.

Limiting which Data Fields are Returned

If the fields request parameter was provided on the request, we need to return only a specified subset of the data fields.

fields is expected to be a comma separated list of data field names, so we get those using split, then set the results.departures array to the result of mapping over its current value, returning departure objects that only contain the requested fields:

if (url.searchParams.get('fields')) {
  const fieldsToReturn = url.searchParams.get('fields').split(',')

  if (fieldsToReturn.length > 0) {
    results.departures = results.departures.map(departure => {
      const newDeparture = {}
      for (const fieldName of fieldsToReturn) {
        newDeparture[fieldName] = departure[fieldName]
      }

      return newDeparture
    })
  }
}

Formatting the Response and Returning it to the Caller

The code that returns the response to the caller first determines if a JSON or String response was requested...

For a JSON response (the default), we create a new Response object, returning formatted JSON and setting the content-type header appropriately:

const responseFormat = url.searchParams.get('format')
if (!responseFormat || responseFormat === 'json') {
  return new Response(JSON.stringify(results, null, 2), {
    headers: {
      'content-type': 'application/json;charset=UTF-8',
      ...CORS_HEADERS,
    },
  })
}

For a String response (the value of the request parameter format is set to string), we need to output the stop ID and stop name first, separated by |, then output each departure's data with a ^ separating each field for that departure and | separating each departure. Note there's also some code to make sure we don't leave a trailing delimiter after the last field:

let stringResults = `${results.stopId}|${results.stopName}`
let stringDepartures = ''
for (const departure of results.departures) {
  for (const val of Object.values(departure)) {
    stringDepartures = `${stringDepartures}${
      stringDepartures.length > 0 ? '^' : ''
    }${val}`
  }

  stringDepartures = `${stringDepartures}|`
}

stringResults = `${stringResults}|${
  stringDepartures.length > 0
    ? stringDepartures.substring(0, stringDepartures.length - 1)
    : ''
}`
return new Response(stringResults, { headers: CORS_HEADERS })

Cross Origin Resource Sharing (CORS)

I wanted the API to be callable from anywhere, including JavaScript embedded in web pages. In order to allow that, I had to enable Cross Origin Resource Sharing or CORS. As we're only handling "simple" GET requests here, we don't need to worry about the CORS pre-flight OPTIONS request scenario. This means that enabling CORS is as simple as ensuring that the correct extra headers are returned with each response.

Here's the headers I'm sending back as I want to allow the API to be called from anywhere:

const CORS_HEADERS = {
  'Access-Control-Allow-Origin': '*',
  'Access-Control-Allow-Methods': 'GET',
}

And here's an example of how to add them to the response that the Cloudflare Worker sends back to the client:

return new Response(JSON.stringify(results, null, 2), {
  headers: {
    'content-type': 'application/json;charset=UTF-8',
    ...CORS_HEADERS,
  },
})