steve8x8 / geotoad

Geocaching query tool written in Ruby
https://buymeacoffee.com/steve8x8
Other
28 stars 8 forks source link

Pause/resume? #346

Closed vitorgalvao closed 7 years ago

vitorgalvao commented 7 years ago

When performing a big query (e.g. by country) there may be a huge list to go through (tens of thousands of results with thousands of search pages).

This may take a very long time to complete. So long in fact that you may have to interrupt it. Unfortunately, after you run the same query it’s likely the process will start all over again (a simple change in the upstream search results will trigger that). This makes an already long process even longer.

Could there be a pause/resume feature of some kind to save/load progress so uninterrupted commands can always continue from where they stopped?

Such a feature would also open the doors to a down-the-line update where it could progressively bring an already saved file up to date with the newest changes. That’s a different feature, though.

steve8x8 commented 7 years ago

In a Unix environment, control-Z will interrupt the running process, and you may resume it with "fg". Also, ctrl-S/ctrl-Q (aka xoff/xon) should stop execution (since stdout output will be paused).

On the other hand: how large are your queries that they must be paused and resumed? You should be aware that you may be risking your account. GS offers pocket queries with a maximal size of 1000 caches. GeoToad queries should not exceed this - and the progressive delay imposed on server connections is there for a reason. (There was a user from Utah who wanted to crawl his home state with about 10k entries. Even without delay progression this would have taken several hours.) Since the source is open, you are free to nuke the delays completely - and to kill your account.

I'm afraid you will have to redesign your queries, or purchase a premium membership to submit PQs...

vitorgalvao commented 7 years ago

I meant interrupting as in having to reboot, so those methods would not work.

I was referring to country searches: the option is there. I do understand why the delays are in place and see no problem with waiting, just with restarting work that was already done.

Pausing/resuming are in fact in line with this as it would mean queries could be interrupted and resumed in chunks, preventing strain on the servers and too many requests. As it is, you either need to restart over and over again or run several queries that will overlap. Both of those are a worse user experience, take way longer, require more work, and make more server requests.

steve8x8 commented 7 years ago

Having to reboot? (I have a vague feeling that I d not want to know the details.) That the option is there doesn't mean you have to use it to the max. There are state searches, at least for the bigger countries, but Utah for example would still return about 10k caches.

Having said that: GeoToad already caches files returned by GS servers. (That's why there's a -C option, BTW.) Whether it makes sense to keep the intermediate state of a state (pun not intended) search - I don't know. Search (aka nearest.aspx) requests usually are the smallest problem, even a 10k state would be done with 500 search result pages, 20 caches each (and in most cases, they can be replayed from the file cache). If that is enough for you, check the -Y and -Z options. On the other hand, cache pages (aka cdpf.aspx) have a lifetime of almost one week. So more than 90% of the work would not have to be redone if you had interrupted your GeoToad run.

I admit I never tested whether a search could always be continued from an intermediate state. There have been (very rare) occasions of GeoToad throwing errors in the middle of a search, but I never found out about the reason. Cleaning up the last search pages and restarting always fixed the issue. This isn't something you could easily debug - because the database at the remote end will have changed in the meantime. This is exactly what I suspect behind these errors: a single newly published cache might cut the chain of search results.

Again, you have the source code, and nobody will stop you from inserting huge delays and submit a new cache to disturb the order of search pages - if you want to be sure it won't break. (OTOH, if you find some breakage, and can nail down the underlying cause = HTML page, you're welcome.)

vitorgalvao commented 7 years ago

Thank you for the detailed explanation.