smasher125354 / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Resumable deletes all folder content not databases #300

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Set resumable as false
2. Run the crawler

Please provide any additional information below.
Right now if you set the parameter resumable as a false it would delete all 
folder content.
It would be great only delete databases not folder content.

Original issue reported on code.google.com by edgar.ri...@gmail.com on 2 Sep 2014 at 6:21

GoogleCodeExporter commented 9 years ago
good point

I will look into it

Feel free to look into it and suggest code if you want.

Original comment by avrah...@gmail.com on 2 Sep 2014 at 6:48

GoogleCodeExporter commented 9 years ago
Bug is verified.

We do delete all contents of that folder.

We should consider though, that maybe in the future we will use an other 
internal DB solution instead of Berkley - in that case we should delete other 
files.

So we will need a generic solution to delete the files which are created.

Original comment by avrah...@gmail.com on 23 Nov 2014 at 9:35

GoogleCodeExporter commented 9 years ago
After revisiting this problem I think we should keep the current behaviour

Please note that crawler4j doesn't delete the storage folder you indicated in 
the crawlerConfig instance, but a sub-folder called "frontier" inside of that 
folder.

So this is a dedicated folder to crawler4j and shouldn't be touched by others 
so no harm in deleting it.

Original comment by avrah...@gmail.com on 8 Dec 2014 at 4:01