TanyaaCJain commented 5 months ago

Implement Enhanced Link Previews with Image and Title Extraction

Description

As discussed with @inhwaS in a weekly meeting, to improve user engagement and provide a richer browsing experience, we aim to enhance the link previews in our application, similar to how social media platforms preview links. This enhancement involves extracting the main image and the title from linked pages to display them as previews. This will give users a glimpse of the content before they decide to click through, which can significantly enhance user interaction and content accessibility.

Objectives

Image Preview: Extract the primary image from the URL to display in the preview. Title Extraction: Retrieve the title of the web page for a concise representation in the preview.

Implementation Considerations:

Utilize meta tags such as og:image and og:title from the HTML of the page to extract the necessary details. Ensure that the fetched data is cached to enhance performance and reduce redundant requests.

TanyaaCJain commented 5 months ago

After careful research, here are the updates:

Challenges on Client-Side Implementation

Implementing this feature on the client-side presents several challenges:

CORS Policy: Modern web browsers implement the same-origin policy that prevents a web page from making requests to a different domain. Fetching the HTML content of a page from a different origin without the server's CORS headers configured to allow such requests will lead to errors.
Efficiency and Security: Parsing the HTML content on the client side is not efficient and may raise security concerns. Without a backend service, we would need to rely on the client's resources to parse potentially large documents, which could lead to performance issues. Additionally, it exposes the application to potential XSS (Cross-Site Scripting) attacks if not handled properly.

Proposed Alternative Solution

Given the challenges above, an alternative method should be used. We can use a server-side solution to fetch and parse the HTML content, which will handle CORS and security issues.

As a fallback, we can implement a client-side URL-to-title heuristic using regex to generate a provisional title. Here’s an example approach:

URL Regex Parsing: We extract the pathname from the URL and use a regex pattern to replace hyphens, underscores, and other special characters to generate a human-friendly title.
Title Capitalization: We capitalize the first letter of each word in the generated title to make it look more like a title.

This approach provides a simple preview while avoiding the complexities and potential issues of fetching and parsing HTML on the client side.

Future scope or alternative approach

Instead of relying on client-side parsing, consider using a server-side method to fetch and parse the HTML. This method avoids CORS issues and allows more controlled and secure handling of external content. Additionally, for environments where server-side handling is not feasible, I have implemented a URL regex-based method as described above to extract a basic title directly from the URL structure, which, while less accurate, provides a reliable fallback mechanism. The PR can be reviewed at #50 .

TanyaaCJain commented 4 months ago

Closing the issue since it is fixed in #50

mlim-usfca / PersonalKnowledge

[UI/UX] Enhance Link Previews with Image and Title Extraction #53