Closed mobob closed 2 months ago
Because It's hard to determine when the page is really loaded for modern websites.
When there's no timeout explicitly specified, Reader will try to return ASAP. As soon as the page loads and appears to contain something useful, Reader would return right away. In many cases this captures the main content correctly while also minimizing delay. However, depending on the implementation of the website, this strategy might not always succeed.
It could be the website first load to contain some content like the Title and Description, before it continues to load the full detail. In such a scenario, Reader might only return with the first batch and miss the details.
When the user explicitly specifies a timeout, the strategy is a little different. Reader will wait for "networkidle0", instead of eagerly trying to return.
Might be related to this site (http://www DOT lafiestalatina DOT ca/), and text mode, but i get wildly inconsistent results when i don't specify a timeout. When i do, and its big, its pretty reliable.
ie:
The ones with no suffix were without a timeout too...
I scanned the code and nothing jumped up. Suffice to say, i'm specifying a timeout going forward, but let me know if i'm misusing or there is something up with what i'm doing! I couldn't find reference to a "default timeout then we return the so-far data".