philippta / flyscrape

Flyscrape is a command-line web scraping tool designed for those without advanced programming skills.
https://flyscrape.com
Mozilla Public License 2.0
1.02k stars 29 forks source link

queue is full, can't add url #42

Closed dynabler closed 6 months ago

dynabler commented 6 months ago

I get the message queue is full, can't add url when running Flyscrape. The JS file has 5000+ start URLs. After the scraping was finished, I counted the data, and it only got 1024 URL's. Am I missing something? Or doing wrong?

philippta commented 6 months ago

This is currently a technical limitation I had to go with. But it can be increased in a later release.

If you have the ability to compile from source, you can increase this number here: https://github.com/philippta/flyscrape/blob/f2d36972b238cb6bfe2548ec59508bbd83be0a05/scrape.go#L81

dynabler commented 6 months ago

No ability to compile it from source, couldn't solve the answer from here: #40

I also used import URLs, and that has its limits as well #43 I get the same message queue is full, can't add url.

At least one of them should have a higher limit.

25000 URLs is a safe limit from my experience. The output still needs to be checked manually, so most people work in batches of 10K to 20K. (5-10-15-20-25)

Looking forward to the release that fixes this.

philippta commented 6 months ago

Fixed in the latest release v0.7.2. You can now add up to a million urls.

https://github.com/philippta/flyscrape/blob/c796f4164c13e30135246c08304acd7142673f60/scrape.go#L83