mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
https://firecrawl.dev
GNU Affero General Public License v3.0
19.19k stars 1.49k forks source link

[Bug] Type of Crawl Response (requires type assertion) #712

Open oliviermills opened 2 months ago

oliviermills commented 2 months ago

Describe the Bug

Type error when using crawlResponse requiring type assertion

To Reproduce Steps to reproduce the issue:

import FirecrawlApp from '@mendable/firecrawl-js';

import dotenv from 'dotenv';

dotenv.config();

const app = new FirecrawlApp({apiKey: process.env.FIRECRAWL_API_KEY});

const crawlResponse = await app.crawlUrl('https://firecrawl.dev', {
  limit: 100,
  scrapeOptions: {
    formats: ['markdown', 'html'],
  }
})

if (!crawlResponse.success) {
  throw new Error(`Failed to crawl: ${crawlResponse.error}`) // <----- type error here because of how types are setup
}

console.log(crawlResponse)

Expected Behavior Should improve typing so we don't have to ts-ignore or type cast

Screenshots If applicable, add screenshots or copies of the command line output to help explain the issue.

image

Environment (please complete the following information):

oliviermills commented 2 months ago

Suggested solution

To avoid type assertions or type guards.. could extend like this:

// Define a base interface for common fields
interface BaseCrawlResponse {
  success: boolean;
  error?: string;
}

// Define the successful response fields
interface SuccessfulCrawlFields {
  status: "scraping" | "completed" | "failed" | "cancelled";
  completed: number;
  total: number;
  creditsUsed: number;
  expiresAt: Date;
  next?: string;
  data: FirecrawlDocument<undefined>[];
}

// Create a unified CrawlResponse type using conditional types
type CrawlResponse = BaseCrawlResponse & (
  { success: true } & SuccessfulCrawlFields |
  { success: false }
);

Then replace any Promise<CrawlStatusResponse | ErrorResponse> with Promise<CrawlResponse>

⚠️ This needs further checking due to how often CrawlResponse is used across the sdk

baraich commented 1 month ago

@oliviermills Hi, I am unable to reproduce the following error, and this can be due to 2 reasons.

oliviermills commented 1 month ago

@oliviermills Hi, I am unable to reproduce the following error, and this can be due to 2 reasons.

  • The issue has been fixed.
  • Or, there is issue with your installation in node_modules (though unlikely)

Clarification: This is a dev/linting/typing issue, not a runnable issue.

There is definitely a typing issue with accessing .error on crawlResponse as the return from crawlUrl() this is due to the return type being defined as Promise<CrawlStatusResponse | ErrorResponse>;

A workaround to avoid the lint error is to use this pattern:

if ('error' in crawlResponse) {
  // This is an ErrorResponse
  throw new Error(`Failed to crawl: ${crawlResponse.error}`);
} else {
  // This is a CrawlStatusResponse
  console.log(crawlResponse);
}
baraich commented 1 month ago

@oliviermills, I know this isn't a runnable error, so I haven't executed the code once. Please refer to images below.

The image shows the typeof crawlResponse variable image

This is the ErrorResponse interface that says success is false, and error is string. image

Thus, when we check that there is no success, it is capable of identifying that crawlResponse is type of ErrorResponse image