jesbin / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Crawlers storage folder taking up too much space. #333

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I am limited on the amount of space I have available. I run about 5 crawlers 
with 10 crawls each. My issue is I quickly run out of space due to the 
BerkleyDB that is being used.

Anyone have any advice or suggestions on how to reduce disk space usage?

Original issue reported on code.google.com by jeger...@gmail.com on 15 Jan 2015 at 8:19

GoogleCodeExporter commented 8 years ago
Sorry mate, this is a basic requirement which I don't see how we can solve...

Original comment by avrah...@gmail.com on 18 Jan 2015 at 5:48

GoogleCodeExporter commented 8 years ago
Not a bug, you might want to move this thread to the forum though...

Original comment by avrah...@gmail.com on 19 Jan 2015 at 4:31

GoogleCodeExporter commented 8 years ago
Where is the forum? Also, how difficult would it be if I try to move from 
Berkeleydb to mysql? 

Original comment by jeger...@gmail.com on 19 Jan 2015 at 4:43

GoogleCodeExporter commented 8 years ago
Forum: https://groups.google.com/forum/#!forum/crawler4j

It is totally doable, and there is even a card to change crawler4j so any one 
will be able to incorporate his own type of internal storage.

But it won't be so easy, and if you want to have your mysql on an external 
server then the crawler might be very very slow...

Original comment by avrah...@gmail.com on 20 Jan 2015 at 8:35