teticio / lambda-scraper

Use AWS Lambda functions as a proxy pool to scrape web pages.
BSD 3-Clause "New" or "Revised" License
112 stars 14 forks source link

best way to troubleshoot lambda errors locally #10

Closed alanzheng88 closed 9 months ago

alanzheng88 commented 9 months ago

hi Robert,

thanks again for such a wonderful project! I'm looking for some guidance as how you would approach troubleshooting these lambda function server 500 errors locally. I was hoping to be able to step through and debug the lambda functions in vscode but there doesnt seem to be a straightforward way to do this. would you have any recommendations? i'm running into server 500 error when testing against any of these sites below

https://httpbin.io/user-agent https://httpbin.org/ip http://httpbin.org/headers

2024-02-11 02:12:03 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://<hash>.lambda-url.us-east-1.on.aws/httpbin.io/user-agent> (failed 3 times): 500 Internal Server Error

teticio commented 9 months ago

Ah I can see what is going wrong there.... You need to replace with the actual random string that AWS generates for you when you create the lambda function and corresponding URL.

alanzheng88 commented 9 months ago

hi @teticio, i did replace the '< hash >' but still run into the same issue. are you experiencing it too? I intentionally left the hash out above for privacy

teticio commented 9 months ago

Ah haha, of course! :-)

I just tested it myself and it works fine... Except, my lambda functions had gone to sleep, so I got a few internal server errors until they had warmed up. I may try to build in some warmup or at least pass the error message back.

teticio commented 9 months ago

Of course, one way to wam them up is to just run the test

alanzheng88 commented 9 months ago

Ah haha, of course! :-)

I just tested it myself and it works fine... Except, my lambda functions had gone to sleep, so I got a few internal server errors until they had warmed up. I may try to build in some warmup or at least pass the error message back.

ohh that might be it! yes! that'll definitely be helpful 👍

alanzheng88 commented 9 months ago

hey @teticio, i thought it was a warm up issue but after trying out a few other urls those others work. just not the one for httpbin.io/user-agent. would you have any suggestions?

FAILED:

// fetch('https://some-randomly-generated-hash.lambda-url.us-east-1.on.aws/httpbin.io/user-agent')
2024-02-16 17:29:58 [scrapy.core.engine] INFO: Spider opened
2024-02-16 17:30:00 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://some-randomly-generated-hash.lambda-url.us-east-1.on.aws/httpbin.io/user-agent> (failed 1 times): 500 Internal Server Error
2024-02-16 17:30:00 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://some-randomly-generated-hash.lambda-url.us-east-1.on.aws/httpbin.io/user-agent> (failed 2 times): 500 Internal Server Error
2024-02-16 17:30:01 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://some-randomly-generated-hash.lambda-url.us-east-1.on.aws/httpbin.io/user-agent> (failed 3 times): 500 Internal Server Error
2024-02-16 17:30:01 [scrapy.core.engine] DEBUG: Crawled (500) <GET https://some-randomly-generated-hash.lambda-url.us-east-1.on.aws/httpbin.io/user-agent> (referer: None)

SUCCESS:

// fetch('https://some-randomly-generated-hash.lambda-url.us-east-1.on.aws/ipinfo.io/ip')
2024-02-16 17:33:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://some-randomly-generated-hash.lambda-url.us-east-1.on.aws/ipinfo.io/ip> (referer: None)
teticio commented 8 months ago

Is this calling the main lambda function which randomly calls the other lambdas> It could still be a warmup issue if this is the case. To troubleshoot, you can try running the main function in each of the lambda JS files in a debugger. I'm not able to reproduce this, but please let me know if you find out anything more.