steve8x8 / geotoad

Geocaching query tool written in Ruby
https://buymeacoffee.com/steve8x8
Other
28 stars 8 forks source link

Search result page count changes half-way, "ERROR: Stuck on page number 3466 of 3466" #355

Closed RoffelKartoffel closed 6 years ago

RoffelKartoffel commented 6 years ago

Hi, I get the following error:

[##%] (3464/3467) Search results: page 3464 (l) [##%] (3465/3467) Search results: page 3465 (l) [##%] (3466/3467) Search results: page 3466 (l) [##%] (3466/3467) Search results: page 3466 (l) ERROR: Stuck on page number 3466 of 3466

Please let me know, if I can provide additional information. Cheers, Jan

steve8x8 commented 6 years ago

OMG, are you serious? Fetching 69,000 caches - how long did it take to get this far? You must have reached sleep times of the order of minutes between queries? (There's a progressive sleep, 1 second per 250 connections on average, for a reason.) You're aware that this might make an end to your cherished GC account very quickly?

What you're seeing seems to happen from time to time, while the exact details aren't known and hard to reproduce (it's sufficient that new caches get published while the query is running - and the longer the query takes, the more probable such an event becomes).

To get you out of this misery though: Please use the file manager of your choice, go into your cache directory (cache/www.geocaching.com/seek below .geotoad or .config/GeoToad or whatever GeoToad tells you it is using, in a line ( - ) Cache directory: ...), sort by date/time, and remove the last/newest 10 .aspx files belonging to your query (probably named nearest.aspx_..._[36 hex digits] where the dots match your query details). Then restart the query, but read the next paragraphs first.

Note that the lifetime of those files is less than one day, thus reproducing the error may become difficult. If you're willing to find out more, you may run geotoad with the -v option (up to three times, which will make it very verbose) and capture the output - "script" would be the right tool in a Linux environment. (I'm not eager to read through the resulting output, to be honest.)

It may already be too late, and your cached search results may have expired, making the issue disappear. It is not recommended in general to "touch" cached files, but it might make sense in this particular case (only the "nearest.aspx" ones, of course).

Having found less than 6000 caches over the period of more than 8 years, I'm curious why you'd be interested in 70 k of them (is this one state? a whole country?)... This seems to be a new record, the previous champ was a guy from Utah who wanted to search all 13 k caches in his state! There's a reason why official PQs are limited to 1 k - please reconsider your approach. It's for your own good.

Cheers, S

steve8x8 commented 6 years ago

Um, thinking about it the root cause may be not an addition of caches but quite the opposite. Since the number of result pages is provided right at the start of the query, with page 1, if the last page becomes unavailable (because all caches now fit into less pages that what was communicated initially) the outcome may be what you saw. Since removal of caches from the list will disrupt the search result chain, there may not be an easy fix (it would be possible to check whether the page count has changed, and I'll consider adding a notification - in most cases this seems to be benign though). So OK, this may be a bug indeed, unrelated to your issue.

RoffelKartoffel commented 6 years ago

Thank you for your detailed answers! I will try adding the vvv switch, if I run into that problem again.

I run geotoad on a linux box which happens to be online 24/7 for other purposes. The area covered by my query is about 2% the size of Utah and is centered in northern Germany. I am especially interested in “night caches” (which I try to filter by cache attribute). I am aware, that this is a rather large query and that I am running it on my own risk. Although it surely isn’t the best idea, to run it on a regular basis. ;)

steve8x8 commented 6 years ago

Good morning, thanks for the additional information.

2% the size of Utah, but 5 times the number of caches, it takes Germany to achieve that ;) Now how to improve your approach? Some ideas:

Can you make use of that? E.g. estimate the publish rate (per day, per 4 hours, ...) in your search area(s), limit the query to a corresponding number of pages (with a safety margin, of course), and only process the caches you haven't seen before to perform the selection?

This way, keep a list of GC IDs (or corresponding GUIDs) already filtered (by keywords, "torch" and "night" attributes, etc), but updated, and only check that for getting archived/modified? This takes some heavy scripting, but I've been doing something similar to get alerted if a new cache gets published in my home zone (not immediately though - my last FTF happened years ago).

If you're a BM only, you will get fooled by caches which are set to PMO later, and it's pretty tricky to find out whether a PMO cache has been archived. You've been warned. If you're a PM, perhaps a PQ is the better choice for you (I cannot check this claim as I'm a BM myself).

Do you maintain a bookmark list with your results?

Cheers, S

steve8x8 commented 6 years ago

This issue is believed to be fixed (or at least, sufficiently addressed) by release 3.28.0