kiliman / tailwindui-crawler

tailwindui-crawler downloads the component HTML files locally
MIT License
756 stars 94 forks source link

Crawler hangs on download or throws ETIMEDOUT error #41

Closed domipoppe closed 3 years ago

domipoppe commented 3 years ago

Hello,

I have the issue that the crawler randomly stops forever after crawling a couple of components. What is the issue?

grafik

Regards,

jensolafkoch commented 3 years ago

I have exactly the same problem. At some (very different) point I get an error connect ETIMEDOUT such as this:

⏳ Processing /components/application-ui/headings/page-headings... ‼️ FetchError: request to https://tailwindui.com/components/application-ui/headings/page-headings failed, reason: connect ETIMEDOUT 104.21.24.52:443 at ClientRequest.<anonymous> (D:\laragon\www\jok-tailwindui-crawler\node_modules\node-fetch\lib\index.js:1461:11) at ClientRequest.emit (events.js:315:20) at TLSSocket.socketErrorListener (_http_client.js:469:9) at TLSSocket.emit (events.js:315:20) at emitErrorNT (internal/streams/destroy.js:106:8) at emitErrorCloseNT (internal/streams/destroy.js:74:3) at processTicksAndRejections (internal/process/task_queues.js:80:21) { type: 'system', errno: 'ETIMEDOUT', code: 'ETIMEDOUT' }

Any solutions?

crawler-01

I need the complete download to upload it to the Shuffle visual editor ... Hm ... :-(

kiliman commented 3 years ago

Looks like it is timing out while downloading from Tailwind.

I can add a TIMEOUT option and a RETRY option as well.

jensolafkoch commented 3 years ago

I went the hard way and used the workflow option on GitHub, changed the cron job to every minute and downloaded the complete directory - that works for the time being.

But I guess I won't be the only one experiencing these timeout errors?

Thanks a lot for this package!!

kiliman commented 3 years ago

You can update the GitHub action to run on demand. Add workflow dispatch: under the on: key.

image

Then you can click on Run Workflow.

image

jensolafkoch commented 3 years ago

Great, thanks!

kiliman commented 3 years ago

@jensolafkoch Can you do me a favor and checkout features/error-handling branch?

I added a try/catch around the fetch and some logging. I want to see what the elapsed time from request to error is. That will help me determine what the default timeout value should be.

I ran the crawler on Windows and didn't have an error, so I can't reproduce the problem you're having.

Thanks!

jensolafkoch commented 3 years ago

... long time elapsed, I guess more or less as before ...

⏳ Processing /components/application-ui/navigation/breadcrumbs... ❌ Error downloading https://tailwindui.com/components/application-ui/navigation/breadcrumbs Elapsed time 21020ms FetchError: request to https://tailwindui.com/components/application-ui/navigation/breadcrumbs failed, reason: connect ETIMEDOUT 172.67.217.20:443 npm ERR! code ELIFECYCLE npm ERR! errno 1 npm ERR! tailwindui-crawler@3.1.5 start: node index.js npm ERR! Exit status 1 npm ERR! npm ERR! Failed at the tailwindui-crawler@3.1.5 start script. npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in: npm ERR! C:\Users\jok\AppData\Roaming\npm-cache_logs\2021-04-26T16_18_54_653Z-debug.log`

kiliman commented 3 years ago

No, that's what I was expecting. I was just curious how long time timeout was. Seems really odd that you're not getting any response with 21 seconds.

Anyway, I added retry logic, so can you pull latest from that branch and try again.

Thanks!

jensolafkoch commented 3 years ago

Does finish now ... thanks again! Glad I could help ... :-)