Open sskmtm opened 1 year ago
1、失败三次就直接失败是框架的机制吗?还是说可以通过某些设置解决
是。 参看 LoadOptions.nMaxRetry
/**
* Retry to fetch at most n times, if page.fetchRetries > nMaxRetry,
* the page is marked as gone and do not fetch it again until -refresh is set to clear page.fetchRetries
* */
@Parameter(names = ["-nmr", "-nMaxRetry", "--n-max-retry"],
description = "Retry to fetch at most n times, if page.fetchRetries > nMaxRetry," +
" the page is marked as gone and do not fetch it again until -refresh is set to clear page.fetchRetries")
var nMaxRetry = 3
2、有没有办法设置,或者代码操作的时候,让失败了还是可以进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程,因为这样可以做一些后置(清除操作)处理
- 事件处理机制提供了丰富的事件处理点,可以用来在网页的生命周期中执行相关任务。参考:AdvancedAsinScraper.scrape()
问一下在哪个事件下,能够查看到 page.crawlStatus.isGone == true ?
我试过了各种事件,都没有捕捉到(实际发生了) 特指三次重试的时候,没有捕捉到 Gone 的状态
发现几种数据爬取失败时的日志,前2次失败了都会在几分钟后重试,第三次失败后,后面就不会重试了,也不会进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程
请问: 1、失败三次就直接失败是框架的机制吗?还是说可以通过某些设置解决 2、有没有办法设置,或者代码操作的时候,让失败了还是可以进入:isRelevant (true) -> onBeforeFilter -> onBeforeExtract -> extract -> onAfterExtract -> onAfterFilter 这个流程,因为这样可以做一些后置(清除操作)处理
第一次失败:
Timeout to wait for document ready after 60 round, retry is supposed
⚠ Privacy leak warning
U for N got 1601 0 <- 0 in 1m8.826s
Trying 2th 5m later
第二次失败:
Page is ROBOT_CHECK
⚠ Privacy leak warning
U for RT got 1601 0 <- 0 in 10.709s
Trying 3th 7m later
第三次失败:
Timeout to wait for document ready after 60 round, retry is supposed
⚠ Privacy leak warning
U for RT got 1601 0 <- 0 in 1m0.988s
Gone (unexpected)