Open kouyakamada opened 1 month ago
@kouyakamada were you able to figure this out? We published a lot more improvements to our self hosting guide. Let us know if that helps.
I also encountered an issue where crawling did not work well in a Proxy environment.
By modifying the area around axios.get() in apps/api/src/scraper/WebScraper/scrapers/fetch.ts
, as shown below, I was able to resolve the issue.
// add module
import * as tunnel from 'tunnel';
try {
//const response = await axios.get(url, {
// headers: {
// "Content-Type": "application/json",
// },
// timeout: universalTimeout,
// transformResponse: [(data) => data], // Prevent axios from parsing JSON automatically
//});
// Set proxy agent
let agent;
const httpProxy = process.env.HTTP_PROXY || null;
const httpsProxy = process.env.HTTPS_PROXY || null;
if (url.startsWith('https://') && httpsProxy) {
const httpsProxyUrl = new URL(httpsProxy);
agent = tunnel.httpsOverHttp({
proxy: {
host: httpsProxyUrl.hostname,
port: parseInt(httpsProxyUrl.port, 10),
},
});
Logger.info(`Using tunnel agent with HTTPS proxy: ${httpsProxy}`);
} else if (url.startsWith('http://') && httpProxy) {
const httpProxyUrl = new URL(httpProxy);
agent = tunnel.httpOverHttp({
proxy: {
host: httpProxyUrl.hostname,
port: parseInt(httpProxyUrl.port, 10),
},
});
Logger.info(`Using tunnel agent with HTTP proxy: ${httpProxy}`);
} else {
Logger.info(`No proxy settings found or not required for the URL: ${url}. Proceeding without a proxy.`);
}
const response = await axios.get(url, {
headers: {
"Content-Type": "application/json",
},
timeout: universalTimeout,
transformResponse: [(data) => data], // Prevent axios from parsing JSON automatically
...(agent ? { httpsAgent: agent, proxy: false } : {}), // set proxy agent
});
Add the tunnel
module to apps/api/Dockerfile
RUN pnpm install
RUN pnpm add tunnel # add
RUN pnpm run build
Build with docker compose and start the container.
docker compose build && docker compose up -d
I hope this helps.
ๆณ้ฎไธๆจ๏ผๆจๆฏ้่ฟAPI KeyๅฐFireCrawlๆณจๅๅฐDifyไธญ็ๅ๏ผๅฆๆๆฏ๏ผๆจๆฌๅฐ้จ็ฝฒ็FireCrawlๆฏๅฆไฝ่ทๅAPI Key็ไปฅๅๅฆไฝ่ฎพ็ฝฎAuthorization็๏ผ
@artificialzjy TEST_API_KEY
@tak-s Sorry for the late reply. Now able to crawl within our internal network and call it from dify! Thank you!
@tak-s Great solution! ~When I get to the RUN pnpm add tunnel
part I'm met with the following:~
โWARNโ deprecated @devil7softwares/pos@1.0.2: This package has been renamed to `fast-tag-pos`
โWARNโ 3 deprecated subdependencies found: glob@7.2.3, inflight@1.0.6, superagent@8.1.2
Packages: +1
+
Progress: resolved 1050, reused 1037, downloaded 1, added 1, done
dependencies:
+ tunnel 0.0.6
โWARNโ Issues with peer dependencies found
.
โโโฌ langchain 0.2.8
โ โโโ โ unmet peer puppeteer@^19.7.2: found 22.12.1
โโโฌ @hyperdx/node-opentelemetry 0.8.1
โโโฌ @opentelemetry/auto-instrumentations-node 0.46.1
โโโฌ @opentelemetry/instrumentation-http 0.51.1
โ โโโฌ @opentelemetry/core 1.24.1 โ โโโ โ unmet peer @opentelemetry/api@">=1.0.0 <1.9.0": found 1.9.0 โโโฌ @opentelemetry/sdk-node 0.51.1
โโโ โ unmet peer @opentelemetry/api@">=1.3.0 <1.9.0": found 1.9.0 โโโฌ @opentelemetry/sdk-trace-base 1.24.1
โ โโโ โ unmet peer @opentelemetry/api@">=1.0.0 <1.9.0": found 1.9.0 โ โโโฌ @opentelemetry/resources 1.24.1
โ โโโ โ unmet peer @opentelemetry/api@">=1.0.0 <1.9.0": found 1.9.0 โโโฌ @opentelemetry/exporter-trace-otlp-proto 0.51.1 โ โโโฌ @opentelemetry/otlp-transformer 0.51.1
โ โโโ โ unmet peer @opentelemetry/api@">=1.3.0 <1.9.0": found 1.9.0 โ โโโฌ @opentelemetry/sdk-logs 0.51.1
โ โ โโโ โ unmet peer @opentelemetry/api@">=1.4.0 <1.9.0": found 1.9.0 โ โโโฌ @opentelemetry/sdk-metrics 1.24.1 โ โโโ โ unmet peer @opentelemetry/api@">=1.3.0 <1.9.0": found 1.9.0 โโโฌ @opentelemetry/sdk-trace-node 1.24.1
โโโ โ unmet peer @opentelemetry/api@">=1.0.0 <1.9.0": found 1.9.0 โโโฌ @opentelemetry/context-async-hooks 1.24.1 โ โโโ โ unmet peer @opentelemetry/api@">=1.0.0 <1.9.0": found 1.9.0 โโโฌ @opentelemetry/propagator-b3 1.24.1
โ โโโ โ unmet peer @opentelemetry/api@">=1.0.0 <1.9.0": found 1.9.0 โโโฌ @opentelemetry/propagator-jaeger 1.24.1
โโโ โ unmet peer @opentelemetry/api@">=1.0.0 <1.9.0": found 1.9.0
~Any tips on how to resolve this? Thank you kindly!~
~EDIT: To clarify, I believe the stage it fails at is the RUN pnpm run build
as it can't identify what 'tunnel' is. But I just want to be sure it has nothing to do with the warnings~
EDIT 2: Got it working now, not really sure what happened but running it a second time did the trick!
I want to call firecrawl hosted on the company network from a dify hosted on the same network. It registers to dify with no problem, but when I run the crawl, it gets stuck with no response from firecrawl. Checking the
docker compose logs
seems to show an error. What is the cause of this?Expected Behavior: Can call firecrawl from a dify hosted on the company network.
Environment : Based on dokcer compose in repository.
docker compose logs
Proxy settings added to dockerfile
Result of calling api from command line