orangecoding / fredy

:heart: Fredy - [F]ind [R]eal [E]states [D]amn Eas[y] - Fredy will constantly search for new listings on sites like Immoscout or Immowelt and send new results to you, so that you can focus on more important things in life ;)
http://www.orange-coding.net
MIT License
212 stars 54 forks source link

Uncaught exception with ImmoScout/ScrapingAnt #36

Closed modk closed 2 years ago

modk commented 2 years ago

Hi,

yesterday I tried the ImmoScout provider for the first time. At least once, scraping/retrieval worked fine and yielded results. After a few hours, though, Fredy crashed with the following error:

node:internal/process/promises:246                                                 
          triggerUncaughtException(err, true /* fromPromise */);                   
          ^                                                                        

Error: Request failed with status code 404                                         
    at createError (/usr/home/.../fredy/node_modules/axios/lib/core/createError.js:16:15)                                                                    
    at settle (/usr/home/.../fredy/node_modules/axios/lib/core/settle.js:17:12)                                                                              
    at IncomingMessage.handleStreamEnd (/usr/home/.../fredy/node_modules/axios/lib/adapters/http.js:293:11)                                                  
    at IncomingMessage.emit (node:events:402:35)                                   
    at endReadableNT (node:internal/streams/readable:1340:12)                      
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {   
[...]

Does this need to be caught somewhere or am I doing something wrong?

Another issue I faced is:

TypeError: Cannot read properties of undefined (reading 'substring')                                                                                                  
    at normalize (/usr/home/.../fredy/lib/provider/immoscout.js:8:58)                                                                                        
    at Array.map (<anonymous>)  

but this one could be easily fixed by checking if o.link is defined and setting it to empty if not. Apparently some ImmoScout entries do not have a link or the parsing goes wrong.

orangecoding commented 2 years ago

Interesting, never saw those errors before. I think the 2nd error is a followup from the first. It seems like the connection to ScrapingAnt is a bit shaky. I will have to add better error handling here.

What do you mean when you say "Fredy crashed"? Crashed like the process stoped?

modk commented 2 years ago

Yes, the process stopped entirely. The end of the error messages was:

[...]
    data: { detail: 'This site can’t be reached' }                                 
  },                                                                               
  isAxiosError: true,                                                              
  toJSON: [Function: toJSON]                                                       
}                                                                                  

Node.js v17.0.1                                                                    
error Command failed with exit code 1.                                             
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
orangecoding commented 2 years ago

This should fix it. https://github.com/orangecoding/fredy/releases/tag/5.3.1

See https://github.com/orangecoding/fredy/commit/c97b323b357f8e9b0319211115130ec5513172ac

Can you please verify?

modk commented 2 years ago

I will try it later or tomorrow. Does this also fix the undefined o.link thing?

orangecoding commented 2 years ago

Yep

modk commented 2 years ago

Has been running fine for almost 18 hours now so appears to be fixed. Thanks a lot.

orangecoding commented 2 years ago

Great. Closing this now.

modk commented 2 years ago

Now I just got another error, probably related:

Error while trying to scrape data. Received error: Request failed with status code 404
/usr/home/.../fredy/lib/services/requestDriver.js:25
    if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
                      ^                  

TypeError: Cannot read properties of undefined (reading 'data')
    at driver (/usr/home/.../fredy/lib/services/requestDriver.js:25:23)
    at runMicrotasks (<anonymous>)       
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

Node.js v17.0.1                          
error Command failed with exit code 1.   

Ok, now looking at the code, it surely is related. Apparently the three retries are not enough. I increased to 5 now and will test again.

modk commented 2 years ago

Still occuring. I needed to manually check for undefined and callback empty for it to not cause an exit.

modk commented 2 years ago

For the record, this is what works somewhat reliably now:

diff --git a/lib/provider/immoscout.js b/lib/provider/immoscout.js
index f7a52a4..1edea1f 100644
--- a/lib/provider/immoscout.js
+++ b/lib/provider/immoscout.js
@@ -9,7 +9,7 @@ function nullOrEmpty(val) {
 function normalize(o) {
   const title = nullOrEmpty(o.title) ? 'NO TITLE FOUND' : o.title.replace('NEU', '');
   const address = nullOrEmpty(o.address) ? 'NO ADDRESS FOUND' : (o.address || '').replace(/\(.*\),.*$/, '').trim();
-  const link = `https://www.immobilienscout24.de${o.link.substring(o.link.indexOf('/expose'))}`;
+  const link = nullOrEmpty(o.link) ? 'NO LINK FOUND' : `https://www.immobilienscout24.de${o.link.substring(o.link.indexOf('/expose'))}`;
   return Object.assign(o, { title, address, link });
 }

diff --git a/lib/services/requestDriver.js b/lib/services/requestDriver.js
index 89ccf44..2fb5491 100644
--- a/lib/services/requestDriver.js
+++ b/lib/services/requestDriver.js
@@ -1,7 +1,7 @@
 const axios = require('axios');
 const axiosRetry = require('axios-retry');

-axiosRetry(axios, { retryDelay: axiosRetry.exponentialDelay, retries: 3 });
+axiosRetry(axios, { retryDelay: axiosRetry.exponentialDelay, retries: 5 });

 function makeDriver(headers = {}) {
   let cookies = '';
@@ -22,14 +22,20 @@ function makeDriver(headers = {}) {
       callback(null, []);
     }

-    if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
-      //assume we have gotten a response from scrapingAnt
-      if (cookies.length === 0) {
-        cookies = result.data.cookies;
+    try {
+      if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
+        //assume we have gotten a response from scrapingAnt
+        if (cookies.length === 0) {
+          cookies = result.data.cookies;
+        }
+        callback(null, result.data.content);
+      } else {
+        callback(null, result.data);
       }
-      callback(null, result.data.content);
-    } else {
-      callback(null, result.data);
+
+    } catch (exception) {
+      console.error(`Error while trying to scrape data. Received error: ${exception.message}`);
+      callback(null, []);
     }
   };
 }