openactive / open-booking-api

Repository for the Open Booking API specification
Other
2 stars 3 forks source link

Playload size reduction during booking flow #234

Open nickevansuk opened 2 years ago

nickevansuk commented 2 years ago

For systems that accept a very high volume of bookings, it may be advantageous to reduce the volume of data in the Open Booking API payloads for the C1, C2 and B responses.

By way of an initial proposal, two suggested ways that this could be achieved are included below.

The idea is that both of these optimisations would be OPTIONAL from a Booking System's perspective, as they likely only apply to systems that receive a very high volume of bookings.

The two proposals below, taken together, reduce the size of the C1, C2 and B payloads by around 57%.

Suggest that the development of this proposal is taken forward in conjunction with one or more scaled implementations, to confirm the performance analysis and expected performance benefits.

1) Reduce duplicate data within the payload

Optimisation

This simple optimisation can be applied to existing implementations by updating the logic that renders the response.

The first reference to the acceptedOffer or orderedItem contains the full data for the opportunity, and the subsequent references include only its @id.

The Broker receiving this response can simply expand it as required by iterating over the OrderItems and storing data provided in a hash map by its @id, and then using this hash map to expand any @id references encountered later in the JSON object.

Performance analysis

It is worth noting that a gzipped response would be the same size with and without this optimisation, as the gzip algorithm is incredibly efficient at compressing duplicate data. Hence the primary benefit of this optimisation is not in reduced bandwidth, but rather in reducing processing overhead of serialising and deserialising the duplicate data.

This would likely save less than 10ms of processing overhead on each response, and so this optimisation is likely only worth considering for systems where a high number of bookings are processed each second.

Example

For example, in a C1, C2 and B response:

"orderedItem": [
    {
      "@type": "OrderItem",
      "position": 0,
      "acceptedOffer": {
        "@type": "Offer",
        "@id": "https://example.com/events/452#/offers/878",
        "name": "Adult",
        "price": 10,
        "priceCurrency": "GBP",
        "validFromBeforeStartDate": "P6D",
        "allowCustomerCancellationFullRefund": true,
        "latestCancellationBeforeStartDate": "P1D"
      },
      "orderedItem": {
        "@type": "ScheduledSession",
        "@id": "https://example.com/events/452/subEvents/132",
        "identifier": 123,
        "eventStatus": "https://schema.org/EventScheduled",
        "maximumAttendeeCapacity": 30,
        "remainingAttendeeCapacity": 20,
        "startDate": "2018-10-30T11:00:00Z",
        "endDate": "2018-10-30T12:00:00Z",
        "duration": "PT1H",
        "superEvent": {
          "@type": "SessionSeries",
          "@id": "https://api.example.com/events/452",
          "name": "Bodypump",
          "activity": [ ... ],
          "duration": "PT1H",
          "url": "https://example.com/events/452",
          "location": { ... }
        }
      }
    },
    {
      "@type": "OrderItem",
      "position": 1,
      "acceptedOffer": "https://example.com/events/452#/offers/878",
      "orderedItem": "https://example.com/events/452/subEvents/132"
    }
  ],

2) Remove opportunity data from the payload

Optimisation

For very high volumes of bookings, it may be more efficient to offload retrieval of live opportunity data to separate endpoints that can be easily cached outside of C1, C2 and B.

The @ids of the orderedItem can be used for this purpose, as they can resolve to API endpoints that return the opportunity data that is expected to be returned in C1, C2 and B.

The Broker receiving a response containing @id references that are not already expanded to include data can simply expand them as required by retrieving any @id references whose data is not already included in the response (as per (1) above), by resolving the @id URL.

It would be expected that Brokers would make use of Conditional GETs (If-Modified-Since or If-None-Match) when retrieving @id references across C1, C2 and B calls, which reduces the load on the Booking System server while ensuring that the opportunity data is up-to-date.

The opportunity data is expected to be stored by the Broker as part of the Order.

The same can be applied to the seller references.

Performance analysis

This optimisation would likely only be effective for scaled systems where such requests can be distributed between microservices, as a single C1, C2 or B response would generate several subsequent GET requests back to the Booking System. These additional GET requests would create additional load on single-instance monolithic systems - and hence this optimisation would likely have a negative effect on the performance of small monolithic systems.

It should also be noted that a similar approach to the above could be used under-the-hood when rendering standard C1, C2 and B responses within a scaled Booking System. The advantage of this under-the-hood approach is that it does not rely on any particular behaviour of the Broker. If the Broker does not use Conditional GETs, or misuses the provided GET endpoints to render "search results" pages instead of using the data from the open feeds, the optimisation could generate additional load on the Booking System and have a negative effect on performance.

If the opportunity endpoints are sat behind an HTTP caching layer such as Varnish or a CDN, then this should protect from some types of Broker misuse (e.g. lack of use of Conditional GETs) - however it still leaves these endpoints open to misuse on "search results" pages (traffic to which does not necessarily result in a booking). This type of "misuse" becomes less of an issue when the overall volume of requests for booking opportunity data within a given booking system is high (at the point where high request volume results in the response cache being populated most of the time, concerns around endpoint misuse become moot).

It's also worth noting that the additional latency of multiple requests will likely result in a more sluggish experience for the end-user overall.

In summary: the point at which this optimisation becomes useful is likely when the following conditions are met:

Example

Example extract from responses from C1, C2 and B:

"seller": "https://example.com/api/organisations/123",
"orderedItem": [
    {
      "@type": "OrderItem",
      "position": 0,
      "acceptedOffer": {
        "@type": "Offer",
        "@id": "https://example.com/events/452#/offers/878",
        "name": "Adult",
        "price": 10,
        "priceCurrency": "GBP",
        "validFromBeforeStartDate": "P6D",
        "allowCustomerCancellationFullRefund": true,
        "latestCancellationBeforeStartDate": "P1D"
      },
      "orderedItem": "https://example.com/events/452/subEvents/132"
    },
    {
      "@type": "OrderItem",
      "position": 1,
      "acceptedOffer": "https://example.com/events/452#/offers/878",
      "orderedItem": "https://example.com/events/452/subEvents/132"
    }
  ],

Example responses from the opportunity and sellers endpoints:

GET /events/452/subEvents/132 HTTP/1.1
Host: example.com
Date: Mon, 8 Oct 2018 20:52:35 GMT
Accept: application/vnd.openactive.booking+json; version=1

{
  "@type": "ScheduledSession",
  "@id": "https://example.com/events/452/subEvents/132",
  "identifier": 123,
  "eventStatus": "https://schema.org/EventScheduled",
  "maximumAttendeeCapacity": 30,
  "remainingAttendeeCapacity": 20,
  "startDate": "2018-10-30T11:00:00Z",
  "endDate": "2018-10-30T12:00:00Z",
  "duration": "PT1H",
  "superEvent": {
    "@type": "SessionSeries",
    "@id": "https://api.example.com/events/452",
    "name": "Bodypump",
    "activity": [ ... ],
    "duration": "PT1H",
    "url": "https://example.com/events/452",
    "location": { ... }
  }
}
GET /api/organisations/123 HTTP/1.1
Host: example.com
Date: Mon, 8 Oct 2018 20:52:35 GMT
Accept: application/vnd.openactive.booking+json; version=1

{
  "@type": "Organization",
  "@id": "https://example.com/api/organisations/123",
  "identifier": "CRUOZWJ1",
  "name": "Better",
  "taxMode": "https://openactive.io/TaxGross",
  "legalName": "Greenwich Leisure Limited",
  "description": "A charitable social enterprise for all the community",
  "url": "https://www.better.org.uk",
  "logo": {
    "@type": "ImageObject",
    "url": "http://data.better.org.uk/images/logo.png"
  },
  "telephone": "020 3457 8700",
  "email": "customerservices@gll.org",
  "vatID": "GB 789 1234 56",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "Alan Peacock Way",
    "addressLocality": "Village East",
    "addressRegion": "Middlesbrough",
    "postalCode": "TS4 3AE",
    "addressCountry": "GB"
  },
  "termsOfService": [
    {
      "@type": "PrivacyPolicy",
      "name": "Privacy Policy",
      "url": "https://example.com/privacy-policy",
      "requiresExplicitConsent": false
    },
    {
      "@type": "TermsOfUse",
      "name": "Terms and Conditions",
      "url": "https://example.com/terms-and-conditions",
      "dateModified": "2019-04-16T20:31:13Z",
      "requiresExplicitConsent": true
    }
  ]
}
nathansalter commented 2 years ago

This would likely save less than 10ms of processing overhead on each response, and so this optimisation is likely only worth considering for systems where a high number of bookings are processed each second.

I'm struggling to see a use-case for this. If you're at a point where you're handling 100 bookings per second, there are going to be much more efficient places to target optimisation techniques than the structure of the booking JSON. As you mention, the GZIP size will be basically identical, so the only place that you'll be increasing speed is during encoding of the JSON and sending it through the port. I think assuming it will save 10ms per request is very optimistic, as it's probably more in the sub-1ms range.

After doing some (brief) tests in PHP, decreasing the size from 100KB to 50KB saves 2ms, and reducing it to 1KB only saves a further 1ms.

drinkynet commented 2 years ago

Any memory footprint benefit in those tests?

nathansalter commented 2 years ago

Just the size of the actual object +10%. So reducing 50KB to 1KB would save about 50KB of memory