guardian / dotcom-rendering

The Guardian web rendering service (aka DCR/DCAR)
https://www.theguardian.com
Apache License 2.0
255 stars 30 forks source link

RFC: Migrating 'show more' #6181

Closed bryophyta closed 1 year ago

bryophyta commented 2 years ago

RFC: Migrating Show More

This is to discuss approaches to closing #4604.

Requirements (unknowns)

Current Solution

  1. User presses Show More on a Fronts container.
  2. API Call is made to e.g. /uk/show-more/headlines.json (/*path/show-more/*containerId.json)
  3. Frontend fetches full PressedPage from S3 and extracts the collection with the requested ID.
  4. Frontend uses ContainerLayout to filter out all cards that are already visible on the client.
  5. Frontend renders the filtered cards using twist/twirl/thingy
  6. Rendered HTML is returned by the API and inserted client side
  7. Duplicated cards are removed from “Show More” section using DOM manipulation

Schematic description of the main DCR options

We can think of the current solution as one way of achieving the following tasks:

(I) and (III) are naturally done in Frontend. (IV) is naturally done in DCR. (VI) can only really be done on the client side. (V) is easy, but the choice of where to do the rendering (DCR server, or client) is consequential. (II) is probably the most complex problem, given the current implementation,[^1] and the choice of whether to render the HTML on the client or on the server doesn’t really affect this problem. However, it’s worth noting that (VI) and (II) could essentially be treated as a single problem (that of ‘deduplicating’ the new cards against the existing cards in the collection) if they are both performed on the client side. Again, this could be done whether the initial markup is rendered on the server or the client (although it’ll be simpler to implement with client-side rendering).

image

(diagram source here)

DCR Solution 1

(Left side of diagram)

Proposed flow:

  1. User presses Show More
  2. API Call is made to API Call is made to /uk/show-more/headlines.json?dcr (/*path/show-more/*containerId.json?dcr)
  3. Frontend fetches full PressedPage from S3 and extracts the collection with the requested ID.
  4. Frontend removes Paid cards if the request isAdFree
  5. Remaining cards are marshalled into JSON and returned by the API
  6. Duplicated cards are removed Client side
    1. This also allows us to remove already visible cards without knowing much about the Container layout.
  7. Remaining cards are rendered using React and inserted into the DOM, containerPalette and existing story IDs can be derived from Island props.

Pros:

Cons:

DCR Solution 2

(Right side of diagram)

Proposed flow:

  1. User presses Show More
  2. API Call is made to API Call is made to /uk/show-more/headlines.json?dcr (/*path/show-more/*containerId.json?dcr)
  3. Frontend fetches full PressedPage from S3 and extracts the collection with the requested ID.
  4. Frontend removes Paid cards if the request isAdFree
  5. Remaining cards are marshalled into JSON sent to DCR
  6. DCR renders Show More cards Server Side
  7. DCR return HTML to Frontend, and Frontend to the Client
  8. HTML is parsed by Client and duplicate cards are removed
    1. This also allows us to remove already visible cards without knowing much about the Container layout.
  9. Deduped HTML is inserted into the DOM

Pros:

Cons:

Proposed Solution for DCR

We propose going with Solution #1, without calculating which cards are visible above the fold:

Previous work

Brief History

Reflections

Past PRs

Related but out of scope

Something that was revealed by the initial work on this issue is that there are a number of quirks in the way that Fronts collections are handled in facia-press, with consequences for the rendering of Fronts and show-more.

When a Front is pressed, facia-press creates a ‘lite’ and a ‘full’ version of the Front. (Plus adFree versions.) One of the main differences is that each collection in the ‘lite’ version contains only the number of cards that facia-press thinks should be contained in the initial rendering of the container on the page. Any remaining cards will (depending on other conditions) be displayed only when the ‘show more’ button is clicked for the container.

There is something slightly odd about the APIs that this encourages, because it’s not clear that the full type is used for anything other than storing full versions of collections. But fetching an entire front in order to fetch a single full collection is quite indirect. (Probably not a pressing issue, though.)

More importantly, if DCR wants to render a Front container which has more cards than facia-press thinks it needs, we will need to adjust the logic in Frontend which determines how many cards should go in a container given its config. This is a kind of coupling which seems quite troubling, especially in relation to dynamic containers, and also the possibility of adding new container types in the future.

So there are good arguments for refactoring at least part of facia-press, in a way that would probably have implications for the best way to implement show-more. However, the issues described in this section don’t block a reasonable solution to show-more, and the solutions proposed above shouldn’t have the effect of deepening the coupling. So the facia-press issues should be considered out of scope for the current issue.

[^1]: Why is this a complex problem?

jamesgorrie commented 2 years ago

This is a very well explained RFC - and offers a clear, logical way forward.

Extra data would have to be sent down the wire to the client as we’d have to send all cards in the payload, not just the ones we calculated to be invisible above the fold.

Do we know how much data this is - and as this is something that I presume (please check me on this) is a lightly used feature, could we only load this once the feature is interacted with (@OllysCoding I know we did expiditeLoading, and spoke about having a deferUntil: 'interaction' or similar)?

I know we're working on measuring this, but it would be good to know how much is "a relatively small amount" would be. Assuming this gets tracked in the island tracking @mxdvl created, and would be surfaced in the compressed-size-action to help us understand the implication of this?

This is not blocking, but just something I feel we should be aware of when delivering this feature.


Container item count

Sorry if I haven't managed to grasp this - but how do we know how many items are needed on the initial container render? Do different container types have different sizes they need or can handle on the initial render?


Thought on server HTML rendered vs client JSON rendered

It feels a little as though we're being more sceptical of the pattern of creating an endpoint on DCR to send HTML back to the client, maybe we should make this clear when this should be used, and what the considerations are, same for rendering JSON - making sure people are made aware of the considerations are to readers.

AshCorr commented 2 years ago

Do we know how much data this is - and as this is something that I presume (please check me on this) is a lightly used feature, could we only load this once the feature is interacted with (@OllysCoding I know we did https://github.com/guardian/dotcom-rendering/pull/4899, and spoke about having a deferUntil: 'interaction' or similar)?

Theres 2 things to consider here, the island size, and the JSON/HTML payload, both are fairly difficult to estimate:

The island size would be bigger in our proposed solution as we'd need to include the JSX templates in the Island for rendering show more.

Luckily I think that we already include React/Preact in another bundle, so assuming this is correct and that Webpack does its job properly we don't need to include React in the "Show More" island bundle. Saving a lot of those valuable KBs! I need to double check this though.

The JSON payload would also be bigger on account that we'd like to drop the above the fold calculations in Frontend of which items are visible or not.

Lets take an example of a fixed/small/slow-IV container that has 10 show more cards.

In the original Frontend implementation we'd get 14 cards from FAPI and would use its own Scala representation of how many cards each container can hold to decide what cards are visible or not, in this case we're talking about slow-IV, which is 4 cards. So we throw away the first 4 cards and then use the remaining 10 to render the HTML.

In the proposed implementation we're saying that we don't want to do that calculation of how many cards are visible in Frontend and instead want to leave that to the client. This would mean simply sending down all 14 cards without trying to calculate which ones are already visible.

Again this is hard to estimate how much extra data we'd be sending, I've seen cases of a 10 card container with only a single show more card and I've seen cases of 4 card containers with 10 show more cards. It varies a lot.

Even though both the bundle will be larger, if I were to take a complete guess, 5-20KB larger, this is mitigated by our proposed solution to only download the bundle and the JSON once the "Show More" button is clicked and not at page load.

Sorry if I haven't managed to grasp this - but how do we know how many items are needed on the initial container render? Do different container types have different sizes they need or can handle on the initial render?

Currently Frontend has a Scala representation of all container models which it can use to work out which cards are visible on inital load and which aren't, the ones that aren't are considered the "show more" items.

This is one of the reasons we wanted to go with the proposed solution as it lets us decouple Frontend slightly from some of the rendering logic.

Thought on server HTML rendered vs client JSON rendered

I think thats possibly somewhat outside the scope of this RFC and could probably have an RFC all by itself. For this particular RFC where we need to deal with client side deduping regardless of the data type it seemed to make sense to try and keep all of the rendering & deduping logic as close as possible, which entails returning JSON from the API.

bryophyta commented 1 year ago

This issue was closed by #6262, which implemented the pattern described in the RFC.