Closed GuoFan1996 closed 5 months ago
Visit Browserless.io to generate an API key, for this part, did you choose some free trial for a week? if that's the case, we should widely register our account to keep it as free
Confirmed that without Browserless.io key, it still works successfully without any error. Browserless.io key is only for youtube link. Other links work even though you don't set the browserless key.
Visit Browserless.io to generate an API key, for this part, did you choose some free trial for a week? if that's the case, we should widely register our account to keep it as free
Yes, using browserless free trial is feasible. I just tried it, fixed a little bug and committed it. You can refer my last commit.
Integrate Puppeteer for Dynamic Content Extraction
Description:
This PR integrates Puppeteer into our content extraction feature, enhancing our ability to handle dynamic content from platforms like YouTube, which was previously inaccessible through static HTML parsing. This upgrade significantly broadens our data retrieval capabilities to include dynamically loaded content.
Major Changes:
Puppeteer Integration: Incorporation of
puppeteer
allows interaction with JavaScript-reliant web pages, opening up a wider array of content extraction possibilities.Dynamic Transcript Extraction: A new function,
extractYoutubeTranscript
, leverages Puppeteer to effectively fetch and extract YouTube transcripts, overcoming our dynamic content extraction hurdles.Configuration and Security: Utilizes environment variables for configuration, including
PUPPETEER_BROWSERLESS_IO_KEY
, enhancing security and deployment flexibility.Before Serve and Test:
.env
file as follows:PUPPETEER_BROWSERLESS_IO_KEY=your-browserless-api-key
. This step ensures secure access to Browserless services for Puppeteer operations.Serve and Test the Function:
Serving the Edge Function Locally:
To serve the Edge Function locally for testing and development, use the Supabase CLI with the following command:
Testing with
curl
:Replace
https://example.com
with the url you want to extract content.This PR resolves #24.