Closed brittainhard closed 9 years ago
In Nutch this is not true. Have you investigated the features of Nutch which limit
Maybe the functionality is just not implemented within memex-explorer
I agree with Lewis. These features are already there in Nutch.
Let me clarify @lewismc @asitang I was not implying that Nutch keeps running indefinitely, but rather that we currently have the crawler running indefinitely.
If you look here https://github.com/memex-explorer/memex-explorer/blob/master/source/apps/crawl_space/crawl_runners.py#L210-236 , you can see that the crawler currently runs on a loop, starting a new round each time.
I am currently in the process of refactoring this code in order to provide a much better interface to the crawler.
Sounds good @brittainhard, if you need any assistance let me know. Having looked at some of the code it appears that some additional arguments are required in order to express number of rounds of fetching rather than the crawler entering a continuous crawl cycle.
That's the idea. So if you look here: https://github.com/memex-explorer/memex-explorer/blob/master/source/task_manager/crawl_tasks.py#L46-61
This is the new crawler code that is going to be integrated. It takes rounds as an argument and passes that argument to the subprocess. I plan on having a UI element that will allow a person to enter the number of rounds they want before starting the crawl. The default argument is one (which reminds me, I need to change the argument to be an integer which is converted to a string, not a string).
+1
On Thursday, May 7, 2015, Brittain Hard notifications@github.com wrote:
That's the idea. So if you look here: https://github.com/memex-explorer/memex-explorer/blob/master/source/task_manager/crawl_tasks.py#L46-61
This is the new crawler code that is going to be integrated. It takes rounds as an argument and passes that argument to the subprocess. I plan on having a UI element that will allow a person to enter the number of rounds they want before starting the crawl. The default argument is one (which reminds me, I need to change the argument to be an integer which is converted to a string, not a string).
— Reply to this email directly or view it on GitHub https://github.com/memex-explorer/memex-explorer/issues/439#issuecomment-100054955 .
Lewis
This new interface should match the rest API @asitang please work with @brittainhard on this
Right now it runs indefinitely. This is not necessarily good, and the user has no idea about the progress of the crawl. We should allow the user to specify how many rounds they want, and/or allow them to see how many rounds have been run, and manually restart a round.