opengovsg / GoGovSG

The official link shortener of the Singapore Government
https://go.gov.sg
Other
92 stars 36 forks source link

Add description meta tag to transition page #546

Closed LoneRifle closed 3 years ago

LoneRifle commented 4 years ago

In a bid to improve our visibility on search engines, we are now looking into providing hints to crawlers so that they can better find and process our shortlinks.

If the user has specified a description to a shortlink, inject this as a meta tag. This gives crawlers hints as to the nature of the page, and the crawlers will display the description in search results.

xming13 commented 4 years ago

Hi @LoneRifle, I have implemented a naive way of injecting description meta tag to the transition page as a proof of concept. https://github.com/xming13/GoGovSG/commit/e54f6e0698a62a9185d1ce2c148cde0c4b1675b4

A few points to note:

  1. In this implementation, when a new transition page is requested, there are two database calls: (a) get the long url of the short url (b) get the description of the short url. These two database calls should be combined as one call.

  2. Currently the long url is cached in redis using short url as the cache key. With the introduction of description meta tag, long url and description should both be cached, ideally together with the same cache key(?).

    One way is to store the key value in the following format: cache key: shortUrl cache value: JSON string of { longUrl: longUrl, description: description }

    Implications of the format change (a) Are there other services that will be impacted? (b) slight/negligible(?) performance drop as json string need to be parsed. I think redis handles it well. (c) Existing cached value need to be purged during deployment

What do you think?

liangyuanruo commented 4 years ago

Hi @xming13 ,

For performance reasons we must go with option 2. Only the first redirect request should require a read from the database. If there is any update to the URL record, the cache should be purged.

The implementation could be such that the feature is optional when reading from cache. This would make it backwards compatible with the existing cache, and not run the risk of request spikes on the database.

xming13 commented 4 years ago

Hi @liangyuanruo,

Hmm the two points I have written are not two options to choose from, but two separate issues that need to be taken care of.

Point 1

In the naive implementation, for the first redirect request (ie. not cached), there are two reads from the database: one read to get long url and one read to get description. These two reads should be combined into one, ie. one read from database to get both long url and description.

Point 2

In the current system, cache is implemented as follows cache key: short url cache value: long url

With the introduction of description, we have a few options:

Option A: One cache key

cache key: short url cache value: JSON string of { longUrl: longUrl, description: description }

Pros: only one read from redis is needed to retrieve both long url and description. Cons: format change, json parsing overhead

Option B: Multiple cache keys

cache key: short url cache value: long url

cache key: description: + short url cache value: description

Pros: backward compatibility, easier to implement as optional feature. Cons: Two reads from redis are required. One read from redis for long url and one read from redis for description.

I think by option 2, you are referring to what's described in "Option B: Multiple cache keys"?

liangyuanruo commented 4 years ago

Noted, thanks. We agree with Point 1, and Option A. To clarify, the implementation needs to be backwards compatible with the format change.