bryophyta commented 2 years ago

RFC: Migrating Show More

This is to discuss approaches to closing #4604.

Requirements (unknowns)

Do we need to preserve the current convention for data-link-name?
- Probably not, as per discussion with Guy in Data + DCR.
Is adFree in scope?
- Assuming yes, if it’s straightforward.
Do we need format information to be passed to DCR?
- Yes (well; containerPalette)
How much deduping do we need to do?
- Current situation: Cards are deduped on the server side and the client side. The latter catches some edge cases which can’t be caught on the server.
- There’s been some discussion as to how much we should care about those edge cases, but no clear decision was reached.

Current Solution

User presses Show More on a Fronts container.
API Call is made to e.g. /uk/show-more/headlines.json (/*path/show-more/*containerId.json)
Frontend fetches full PressedPage from S3 and extracts the collection with the requested ID.
Frontend uses ContainerLayout to filter out all cards that are already visible on the client.
Frontend renders the filtered cards using twist/twirl/thingy
Rendered HTML is returned by the API and inserted client side
Duplicated cards are removed from “Show More” section using DOM manipulation

Schematic description of the main DCR options

We can think of the current solution as one way of achieving the following tasks:

I. Locate and retrieve the pressed collection data for the relevant container.
II. Remove cards that are already visible on the Front.
III. Remove paid cards if the request isAdFree.
IV. Determine the correct palette for the container.
V. Render the collection JSON as HTML.
VI. Remove cards which have been duplicated due to the collection changing between the original page load and the /show-more/ endpoint call.

(I) and (III) are naturally done in Frontend. (IV) is naturally done in DCR. (VI) can only really be done on the client side. (V) is easy, but the choice of where to do the rendering (DCR server, or client) is consequential. (II) is probably the most complex problem, given the current implementation,[^1] and the choice of whether to render the HTML on the client or on the server doesn’t really affect this problem. However, it’s worth noting that (VI) and (II) could essentially be treated as a single problem (that of ‘deduplicating’ the new cards against the existing cards in the collection) if they are both performed on the client side. Again, this could be done whether the initial markup is rendered on the server or the client (although it’ll be simpler to implement with client-side rendering).

(diagram source here)

DCR Solution 1

(Left side of diagram)

Proposed flow:

User presses Show More
API Call is made to API Call is made to /uk/show-more/headlines.json?dcr (/*path/show-more/*containerId.json?dcr)
Frontend fetches full PressedPage from S3 and extracts the collection with the requested ID.
Frontend removes Paid cards if the request isAdFree
Remaining cards are marshalled into JSON and returned by the API
Duplicated cards are removed Client side
1. This also allows us to remove already visible cards without knowing much about the Container layout.
Remaining cards are rendered using React and inserted into the DOM, containerPalette and existing story IDs can be derived from Island props.

Pros:

Reduced backend complexity as we don’t need to calculate visibility above the fold, this means that we don’t need to rely on container layout in facia. (Although it is still needed in facia-press)
It’s easier to dedup already visible cards in JSON versus parsing a HTML representation and deduping it there.

Cons:

Extra data would have to be sent down the wire to the client as we’d have to send all cards in the payload, not just the ones we calculated to be invisible above the fold.
We’d need to render the HTML client side instead of server side, which can be quite resource intensive.

DCR Solution 2

(Right side of diagram)

Proposed flow:

User presses Show More
API Call is made to API Call is made to /uk/show-more/headlines.json?dcr (/*path/show-more/*containerId.json?dcr)
Frontend fetches full PressedPage from S3 and extracts the collection with the requested ID.
Frontend removes Paid cards if the request isAdFree
Remaining cards are marshalled into JSON sent to DCR
DCR renders Show More cards Server Side
DCR return HTML to Frontend, and Frontend to the Client
HTML is parsed by Client and duplicate cards are removed
1. This also allows us to remove already visible cards without knowing much about the Container layout.
Deduped HTML is inserted into the DOM

Pros:

Client does not need to use React to render the Cards locally.
We may be able to send less data down the wire to the client by calculating which cards are visible in DCR when doing SSR.

Cons:

2 API calls need to be made by Frontend, one to S3, one to DCR.

Proposed Solution for DCR

We propose going with Solution #1, without calculating which cards are visible above the fold:

We can remove some of the dependency and complexity of having container layouts in Frontend. We still rely on them in facia-press, though.
Although extra data would have to be sent down the wire it would still be a relatively small amount of data. Worst case scenario would be that we need to send the data to render 10 extra cards down the wire.
Either way we need to do deduping on the Client, it's easier to dedup JSON than it is HTML and it also solves our visibility problem.
We already have precedent for client side rendering additional parts of the page (Onwards). In addition, there is scope for deferring the hydration of the show-more island until the user has interacted with the button, which would mean we only need to load the code for users who wish to make use of the feature.

Previous work

Brief History

Work was done on migrating show-more functionality in DCR earlier this year. This work stalled during PR review.
This previous solution was guided by the aim of reducing the amount of data being sent to the client. This motivated two decisions:
- The extra cards should be rendered on the server.
- Markup for as few cards as possible should be sent to the client.
These aims are consistent with the existing Frontend approach, and have general support among the team, but there were disagreements about how much weight to place on them.
Removing unneeded cards from the show-more response requires knowing how many cards should be visible ‘above the fold’ in the relevant container (II, above). This data was not easily available anywhere, and the calculation in the existing Frontend ‘show-more/’ endpoint is coupled with the rendering logic. So it was proposed to remove this logic from the endpoint, and persist the number of ‘visible’ cards which is calculated in facia-press by adding it to a new version of the pressed front json stored in S3.
A number of Dotcom team members expressed reservations about adding a new pressed version, and its introduction of a new form of coupling with facia-press.
These two main questions (server- vs. client-side rendering, and how to persist/calculate which cards to render) were never resolved as such. The work was put on pause for some time, and given intervening work (e.g. on Onwards) it seems to make sense to revisit show-more afresh.

Reflections

Questions about when/how to filter out ‘above the fold’ cards (II), and other deduping (VI) are important implementation details, but they don’t look obviously different whether we choose to render the cards on the client or on the server. (And similarly for deciding the containerPalette and filtering out isPaid cards.)
Therefore the primary question seems to be whether we render on the client or the server; the other implementation details don’t seem likely to tip the balance on that one way or another.
While there are important concerns about the presence of rendering logic in facia-press, and its downstream impact on initial Front rendering and on show-more, these concerns feel like they are out of scope for this issue. (See notes below.)

Past PRs

https://github.com/guardian/dotcom-rendering/pull/5476 - Adding a ShowMore/ endpoint for DCR SSR Show More
https://github.com/guardian/frontend/pull/25086 - Adding support for aforementioned endpoint in Frontend

Related but out of scope

Something that was revealed by the initial work on this issue is that there are a number of quirks in the way that Fronts collections are handled in facia-press, with consequences for the rendering of Fronts and show-more.

When a Front is pressed, facia-press creates a ‘lite’ and a ‘full’ version of the Front. (Plus adFree versions.) One of the main differences is that each collection in the ‘lite’ version contains only the number of cards that facia-press thinks should be contained in the initial rendering of the container on the page. Any remaining cards will (depending on other conditions) be displayed only when the ‘show more’ button is clicked for the container.

There is something slightly odd about the APIs that this encourages, because it’s not clear that the full type is used for anything other than storing full versions of collections. But fetching an entire front in order to fetch a single full collection is quite indirect. (Probably not a pressing issue, though.)

More importantly, if DCR wants to render a Front container which has more cards than facia-press thinks it needs, we will need to adjust the logic in Frontend which determines how many cards should go in a container given its config. This is a kind of coupling which seems quite troubling, especially in relation to dynamic containers, and also the possibility of adding new container types in the future.

So there are good arguments for refactoring at least part of facia-press, in a way that would probably have implications for the best way to implement show-more. However, the issues described in this section don’t block a reasonable solution to show-more, and the solutions proposed above shouldn’t have the effect of deepening the coupling. So the facia-press issues should be considered out of scope for the current issue.

[^1]: Why is this a complex problem?

Knowing which cards are visible 'above the fold' in a given container on a given front requires knowing quite a lot about the rendering logic for the container-type in question.
Currently, this value is calculated in facia-press, but the value is not persisted. So it is re-calculated in the /show-more endpoint logic. Ideally Frontend wouldn't need to know about this rendering logic at all.
An added complication comes from the fact that the pressed Front can change in between the original front page being loaded, and the user's clicking 'show more'. This is what (currently) necessitates client-side deduping. (Although it has been questioned how important this is as an edge case.)

jamesgorrie commented 2 years ago

This is a very well explained RFC - and offers a clear, logical way forward.

Extra data would have to be sent down the wire to the client as we’d have to send all cards in the payload, not just the ones we calculated to be invisible above the fold.

Do we know how much data this is - and as this is something that I presume (please check me on this) is a lightly used feature, could we only load this once the feature is interacted with (@OllysCoding I know we did expiditeLoading, and spoke about having a deferUntil: 'interaction' or similar)?

I know we're working on measuring this, but it would be good to know how much is "a relatively small amount" would be. Assuming this gets tracked in the island tracking @mxdvl created, and would be surfaced in the compressed-size-action to help us understand the implication of this?

This is not blocking, but just something I feel we should be aware of when delivering this feature.

Container item count

Sorry if I haven't managed to grasp this - but how do we know how many items are needed on the initial container render? Do different container types have different sizes they need or can handle on the initial render?

Thought on server HTML rendered vs client JSON rendered

It feels a little as though we're being more sceptical of the pattern of creating an endpoint on DCR to send HTML back to the client, maybe we should make this clear when this should be used, and what the considerations are, same for rendering JSON - making sure people are made aware of the considerations are to readers.

AshCorr commented 2 years ago

Do we know how much data this is - and as this is something that I presume (please check me on this) is a lightly used feature, could we only load this once the feature is interacted with (@OllysCoding I know we did https://github.com/guardian/dotcom-rendering/pull/4899, and spoke about having a deferUntil: 'interaction' or similar)?

Theres 2 things to consider here, the island size, and the JSON/HTML payload, both are fairly difficult to estimate:

The island size would be bigger in our proposed solution as we'd need to include the JSX templates in the Island for rendering show more.

Luckily I think that we already include React/Preact in another bundle, so assuming this is correct and that Webpack does its job properly we don't need to include React in the "Show More" island bundle. Saving a lot of those valuable KBs! I need to double check this though.

The JSON payload would also be bigger on account that we'd like to drop the above the fold calculations in Frontend of which items are visible or not.

Lets take an example of a fixed/small/slow-IV container that has 10 show more cards.

In the original Frontend implementation we'd get 14 cards from FAPI and would use its own Scala representation of how many cards each container can hold to decide what cards are visible or not, in this case we're talking about slow-IV, which is 4 cards. So we throw away the first 4 cards and then use the remaining 10 to render the HTML.

In the proposed implementation we're saying that we don't want to do that calculation of how many cards are visible in Frontend and instead want to leave that to the client. This would mean simply sending down all 14 cards without trying to calculate which ones are already visible.

Again this is hard to estimate how much extra data we'd be sending, I've seen cases of a 10 card container with only a single show more card and I've seen cases of 4 card containers with 10 show more cards. It varies a lot.

Even though both the bundle will be larger, if I were to take a complete guess, 5-20KB larger, this is mitigated by our proposed solution to only download the bundle and the JSON once the "Show More" button is clicked and not at page load.

Sorry if I haven't managed to grasp this - but how do we know how many items are needed on the initial container render? Do different container types have different sizes they need or can handle on the initial render?

Currently Frontend has a Scala representation of all container models which it can use to work out which cards are visible on inital load and which aren't, the ones that aren't are considered the "show more" items.

This is one of the reasons we wanted to go with the proposed solution as it lets us decouple Frontend slightly from some of the rendering logic.

Thought on server HTML rendered vs client JSON rendered

I think thats possibly somewhat outside the scope of this RFC and could probably have an RFC all by itself. For this particular RFC where we need to deal with client side deduping regardless of the data type it seemed to make sense to try and keep all of the rendering & deduping logic as close as possible, which entails returning JSON from the API.

bryophyta commented 1 year ago

This issue was closed by #6262, which implemented the pattern described in the RFC.

guardian / dotcom-rendering