rooteco / tweetscape

the supercharged twitter feed
https://prototype.tweetscape.co
GNU Affero General Public License v3.0
18 stars 2 forks source link

Normalize article links in PostgreSQL #360

Open nicholaschiang opened 2 years ago

nicholaschiang commented 2 years ago

Imported from @nicholaschiang's original Linear issue TS-45.

On the homepage for NFTs:

image.png

The "Yuga Labs x Animoca Brands" link appears twice because one of them has an expanded_url of http://somethingisbrewing.xyz/ while the other has https://somethingisbrewing.xyz/ and it seems like Twitter's shortened url counts differences in link protocol as differences in links.

While it may be wise decision to leave this as is (and simply rely on Twitter's shortened url to determine if two links are the same), it might be better for the end-user UX to normalize the links (disregarding things like protocol and domain name capitalization) before inserting them into the links PostgreSQL table and checking for uniqueness.