Closed dhruvbhatia closed 8 years ago
Ah, so if you've got a crawl that ran the first time and the corpus/data-structure are on disk, then it won't proceed further (since it can't find a new url). If you point it at a different destination or nuke the file path, you should be ok.
This probably needs a flag + log line.
@shriphani Great, thanks for clarifying - yep, it was the caching at play!
One question - what if someone wanted to cache-bust and monitor let's say a dynamic Product Listing page (which changes indeterminately)? Would it make sense to add a config parameter to force pegasus to always scrape and timestamp corpuses, or is this better handled through giving the destination URL some kind of unique query parameter so that pegasus treats it as a unique page and proceeds to scrape it?
Yes, I think the config should accept a parameter and flush all the caches.
I'm going to merge this with #25
Hi @shriphani,
Under the latest master release, It looks like the examples from
README.md
sometimes hang on thequeue/enqueue-url
fn.I think this may have something to do with the XML parser choking on your RSS feed (I see you've added
(catch Exception e nil)
blocks in the examples) or the latest writer fixes you had mentioned. Below is my console output using either of the example code blocks when the hang occurs:Note this only appears to happen some of the time, so it might possibly be related to network issues on my end. I'll keep exploring!
Edit: This also leads me to ask - what are your views on embedding examples under an
./examples
directory within the pegasus project?