Closed modk closed 2 years ago
Interesting, never saw those errors before. I think the 2nd error is a followup from the first. It seems like the connection to ScrapingAnt is a bit shaky. I will have to add better error handling here.
What do you mean when you say "Fredy crashed"? Crashed like the process stoped?
Yes, the process stopped entirely. The end of the error messages was:
[...]
data: { detail: 'This site can’t be reached' }
},
isAxiosError: true,
toJSON: [Function: toJSON]
}
Node.js v17.0.1
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
This should fix it. https://github.com/orangecoding/fredy/releases/tag/5.3.1
See https://github.com/orangecoding/fredy/commit/c97b323b357f8e9b0319211115130ec5513172ac
Can you please verify?
I will try it later or tomorrow. Does this also fix the undefined o.link
thing?
Yep
Has been running fine for almost 18 hours now so appears to be fixed. Thanks a lot.
Great. Closing this now.
Now I just got another error, probably related:
Error while trying to scrape data. Received error: Request failed with status code 404
/usr/home/.../fredy/lib/services/requestDriver.js:25
if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
^
TypeError: Cannot read properties of undefined (reading 'data')
at driver (/usr/home/.../fredy/lib/services/requestDriver.js:25:23)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
Node.js v17.0.1
error Command failed with exit code 1.
Ok, now looking at the code, it surely is related. Apparently the three retries are not enough. I increased to 5 now and will test again.
Still occuring. I needed to manually check for undefined
and callback empty for it to not cause an exit.
For the record, this is what works somewhat reliably now:
diff --git a/lib/provider/immoscout.js b/lib/provider/immoscout.js
index f7a52a4..1edea1f 100644
--- a/lib/provider/immoscout.js
+++ b/lib/provider/immoscout.js
@@ -9,7 +9,7 @@ function nullOrEmpty(val) {
function normalize(o) {
const title = nullOrEmpty(o.title) ? 'NO TITLE FOUND' : o.title.replace('NEU', '');
const address = nullOrEmpty(o.address) ? 'NO ADDRESS FOUND' : (o.address || '').replace(/\(.*\),.*$/, '').trim();
- const link = `https://www.immobilienscout24.de${o.link.substring(o.link.indexOf('/expose'))}`;
+ const link = nullOrEmpty(o.link) ? 'NO LINK FOUND' : `https://www.immobilienscout24.de${o.link.substring(o.link.indexOf('/expose'))}`;
return Object.assign(o, { title, address, link });
}
diff --git a/lib/services/requestDriver.js b/lib/services/requestDriver.js
index 89ccf44..2fb5491 100644
--- a/lib/services/requestDriver.js
+++ b/lib/services/requestDriver.js
@@ -1,7 +1,7 @@
const axios = require('axios');
const axiosRetry = require('axios-retry');
-axiosRetry(axios, { retryDelay: axiosRetry.exponentialDelay, retries: 3 });
+axiosRetry(axios, { retryDelay: axiosRetry.exponentialDelay, retries: 5 });
function makeDriver(headers = {}) {
let cookies = '';
@@ -22,14 +22,20 @@ function makeDriver(headers = {}) {
callback(null, []);
}
- if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
- //assume we have gotten a response from scrapingAnt
- if (cookies.length === 0) {
- cookies = result.data.cookies;
+ try {
+ if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
+ //assume we have gotten a response from scrapingAnt
+ if (cookies.length === 0) {
+ cookies = result.data.cookies;
+ }
+ callback(null, result.data.content);
+ } else {
+ callback(null, result.data);
}
- callback(null, result.data.content);
- } else {
- callback(null, result.data);
+
+ } catch (exception) {
+ console.error(`Error while trying to scrape data. Received error: ${exception.message}`);
+ callback(null, []);
}
};
}
Hi,
yesterday I tried the ImmoScout provider for the first time. At least once, scraping/retrieval worked fine and yielded results. After a few hours, though, Fredy crashed with the following error:
Does this need to be caught somewhere or am I doing something wrong?
Another issue I faced is:
but this one could be easily fixed by checking if
o.link
is defined and setting it to empty if not. Apparently some ImmoScout entries do not have a link or the parsing goes wrong.