This issue seems to occur because the elements on the page use relative paths like <a class="reference internal" href="../../tutorials/token_classification.html">Token Classification (text)</a>.
When a page redirects, Firecrawl continues to use the base URL for generating links to crawl. We’ll need to adjust the handling for these redirect cases
When crawling https://docs.cleanlab.ai, it looks like the URLs are incomplete. For example, this URL in the response: https://docs.cleanlab.ai/cleanlab/token_classification/index.html is indeed a 404 page, but it should actually be: https://docs.cleanlab.ai/stable/cleanlab/token_classification/index.html.
This issue seems to occur because the elements on the page use relative paths like
<a class="reference internal" href="../../tutorials/token_classification.html">Token Classification (text)</a>
.