kgretzky / dcrawl

Simple, but smart, multi-threaded web crawler for randomly gathering huge lists of unique domain names.
MIT License
516 stars 93 forks source link

resuming old thread #2

Open d4op opened 7 years ago

d4op commented 7 years ago

theres a bug its adding the old urls found before and goes on without problems. when i check after stopping dcrawl the file, it didnt add anything new.

kgretzky commented 7 years ago

Can you see in console output that it is adding new domains?

d4op commented 7 years ago

Yes it adds the old ones in console. But Dosent saves the new ones

Am 18.08.2017 um 14:06 schrieb Kuba Gretzky notifications@github.com<mailto:notifications@github.com>:

Can you see in console output that it is adding new domains?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/kgretzky/dcrawl/issues/2#issuecomment-323336284, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALLcyhjy0veaqDR0sIfSKdHFMYmguYzEks5sZX5DgaJpZM4O7Y_p.

kgretzky commented 7 years ago

I will need to see some console output to be able to assist.

On 19 Aug 2017 09:27, "d4op" notifications@github.com wrote:

Yes it adds the old ones in console. But Dosent saves the new ones

Am 18.08.2017 um 14:06 schrieb Kuba Gretzky <notifications@github.com< mailto:notifications@github.com>>:

Can you see in console output that it is adding new domains?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ kgretzky/dcrawl/issues/2#issuecomment-323336284, or mute the thread< https://github.com/notifications/unsubscribe-auth/ ALLcyhjy0veaqDR0sIfSKdHFMYmguYzEks5sZX5DgaJpZM4O7Y_p>.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kgretzky/dcrawl/issues/2#issuecomment-323506776, or mute the thread https://github.com/notifications/unsubscribe-auth/ASXmJ5APpJRGOyEpbry-IOvEgKYhxzlGks5sZo5igaJpZM4O7Y_p .

d4op commented 7 years ago

there is no output, it says added xxxx.com etc but the new ones arent written to the old file.

martinkivi commented 7 years ago

@kgretzky Thank you for building this, saves me bunch of time.

I can confirm this happening. There is no additional details in the console but this is how you can replicate this (have tried both on OS X 10.11.6 and Debian Stretch):

ubogdan commented 5 years ago

Issue fixed , you can use https://github.com/ubogdan/dcrawl until the owner will have some time to merge it into master branch.

mathieu-aubin commented 5 years ago

the issue will be fixed by changing the file open line to this

fo, err := os.OpenFile(*output_file, os.O_RDWR|os.O_APPEND|os.O_CREATE, 0664)