Closed rhyswynn closed 1 month ago
@rhyswynn good catch! #516 fixes some of the evaluations. I'm gonna add a commit with the ones that are missing.
hey @rhyswynn I just committed all validations for USE_DB_AUTHENTICATION. Could you please check if #516 resolves this issue?
scraper/WebScraper/single_url.ts also needs to be updated as the different scraper selection methods are evaluating the environment variable, not the new boolean variable.
@rhyswynn @rafaelsideguide Some of those scraping variables are also impacted by #531. For example, the ScrapingBee scraper will attempt to run in Docker, because its value is actually set to the end-of-line comment instead of being blank.
I'm adding to the scraping orders the following (PR #516):
export const baseScrapers = [
useFireEngine ? "fire-engine" : undefined,
useFireEngine ? "fire-engine;chrome-cdp" : undefined,
useScrapingBee ? "scrapingBee" : undefined,
useDatabaseAuth ? undefined : "playwright",
useScrapingBee ? "scrapingBeeLoad" : undefined,
"fetch",
].filter(Boolean);
let defaultOrder = [
useFireEngine ? "fire-engine" : undefined,
useFireEngine ? "fire-engine;chrome-cdp" : undefined,
useScrapingBee ? "scrapingBee" : undefined,
useScrapingBee ? "scrapingBeeLoad" : undefined,
useDatabaseAuth ? undefined : "playwright",
"fetch",
].filter(Boolean);
@kevinswiber @rhyswynn let me know if this solves the issue
Yes, #516 looks like it will take care of everything. Thank you!
The USE_DB_AUTHENTICATION environment variable is treated as a boolean in the single_url.ts and scrape-events.ts, where it is expecting a boolean true, or it seems 'undefined' instead of anything in single_url. This prevents playwright from being invoked when the value is set to false in the env file. But leaving it blank or commenting it out causes errors in the supabase.ts evaluation where it is looking for a string value false.
To Reproduce Steps to reproduce the issue:
Configure the .env file with USE_DB_AUTHENTICATION=
Run the docker compose up
Errors will display that Supabase environment variables aren't configured correctly.
Line 12 of supabase.ts is expecting a string value https://github.com/mendableai/firecrawl/blob/5a778f2c22a451f1eead5eb9733bcd462d3cd081/apps/api/src/services/supabase.ts#L12
Line 20 of supabase.ts is returning the error ERROR - Supabase environment variables aren't configured correctly. Supabase client will not be initialized. Fix ENV configuration or disable DB authentication with USE_DB_AUTHENTICATION env variable
Configure the .env file with USE_DB_AUTHENTICATION=false
Run the docker compose up
Run a scrape request with pageOptions/waitFor configured with a positive number
Several lines in single_url.ts related to scraper selection evaluate for the property to be undefined, so playwright is never used https://github.com/mendableai/firecrawl/blob/5a778f2c22a451f1eead5eb9733bcd462d3cd081/apps/api/src/scraper/WebScraper/single_url.ts#L91
scrape-events.ts gives an error trying to use Supabase because it evaluates the property looking for a boolean ERROR - Attempted to access Supabase client when it's not configured. https://github.com/mendableai/firecrawl/blob/5a778f2c22a451f1eead5eb9733bcd462d3cd081/apps/api/src/lib/scrape-events.ts#L39
Expected Behavior Configure .env with USE_DB_AUTHENTICATION= Run docker compose up with no errors Run a scrape request with pageOptions/waitFor configured with a positive number playwright is invoked with no errors
Environment (please complete the following information):
Logs I provided the specific lines of code in the files where the errors were being generated