huginn / huginn

Create agents that monitor and act on your behalf. Your agents are standing by!
MIT License
43.11k stars 3.75k forks source link

Website Agent: Error when fetching url #2700

Closed Rossil2012 closed 4 years ago

Rossil2012 commented 4 years ago

When using Website Agent to scrape data from "http://electsys.sjtu.edu.cn/edu/", the error below will occur occasionally, setting the working status to "No". Though the periodical check will not be stopped and new events will refresh the working status, the knowledge of the website will be not so up-date. Thus I am wondering if I can restart the agent as soon as the error occurs.

My code is here:

{
  "expected_update_period_in_days": "30",
  "url": "http://electsys.sjtu.edu.cn/edu/",
  "type": "html",
  "mode": "on_change",
  "http_success_codes": [
    0
  ],
  "extract": {
    "url": {
      "css": "td[class=\"18line\"]>a",
      "value": "@href"
    },
    "title": {
      "css": "td[class=\"18line\"]>a",
      "value": "@title"
    },
    "date": {
      "css": "td[class=\"18line\"]>font",
      "value": "text()"
    }
  }
}

And the log is here:

Error when fetching url: Failure when receiving data from the peer /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/adapters/faraday.rb:106:in block in request' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/callbacks.rb:146:inblock in execute_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/callbacks.rb:145:in each' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/callbacks.rb:145:inexecute_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/operations.rb:35:in finish' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/easy_factory.rb:164:inblock in set_callback' /app/vendor/bundle/ruby/2.5.0/gems/ethon-0.12.0/lib/ethon/easy/response_callbacks.rb:68:in block in complete' /app/vendor/bundle/ruby/2.5.0/gems/ethon-0.12.0/lib/ethon/easy/response_callbacks.rb:68:ineach' /app/vendor/bundle/ruby/2.5.0/gems/ethon-0.12.0/lib/ethon/easy/response_callbacks.rb:68:in complete' /app/vendor/bundle/ruby/2.5.0/gems/ethon-0.12.0/lib/ethon/easy/operations.rb:33:inperform' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/operations.rb:16:in run' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/cacheable.rb:18:inrun' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/block_connection.rb:31:in run' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/stubbable.rb:25:inrun' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/request/before.rb:26:in run' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/adapters/faraday.rb:82:inperform_request' /app/vendor/bundle/ruby/2.5.0/gems/typhoeus-1.3.1/lib/typhoeus/adapters/faraday.rb:72:in call' /app/vendor/bundle/ruby/2.5.0/gems/faraday_middleware-0.12.2/lib/faraday_middleware/gzip.rb:24:incall' /app/vendor/bundle/ruby/2.5.0/gems/faraday-0.12.1/lib/faraday/request/url_encoded.rb:15:in call' /app/vendor/bundle/ruby/2.5.0/gems/faraday-0.12.1/lib/faraday/request/multipart.rb:15:incall' /app/vendor/bundle/ruby/2.5.0/gems/faraday_middleware-0.12.2/lib/faraday_middleware/response/follow_redirects.rb:78:in perform_with_redirection' /app/vendor/bundle/ruby/2.5.0/gems/faraday_middleware-0.12.2/lib/faraday_middleware/response/follow_redirects.rb:66:incall' /app/app/concerns/web_request_concern.rb:26:in call' /app/vendor/bundle/ruby/2.5.0/gems/faraday-0.12.1/lib/faraday/rack_builder.rb:139:inbuild_response' /app/vendor/bundle/ruby/2.5.0/gems/faraday-0.12.1/lib/faraday/connection.rb:386:in run_request' /app/vendor/bundle/ruby/2.5.0/gems/faraday-0.12.1/lib/faraday/connection.rb:149:inget' /app/app/models/agents/website_agent.rb:397:in check_url' /app/app/models/agents/website_agent.rb:386:inblock in check_urls' /app/app/models/agents/website_agent.rb:385:in each' /app/app/models/agents/website_agent.rb:385:incheck_urls' /app/app/models/agents/website_agent.rb:379:in check' /app/app/concerns/sortable_events.rb:92:incheck' /app/app/jobs/agent_check_job.rb:7:in perform' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/execution.rb:39:inblock in perform_now' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:109:in block in run_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/i18n-1.6.0/lib/i18n.rb:297:inwith_locale' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/translation.rb:9:in block (2 levels) in <module:Translation>' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:118:ininstance_exec' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:118:in block in run_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/logging.rb:26:inblock (4 levels) in ' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/notifications.rb:168:in block in instrument' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/notifications/instrumenter.rb:23:ininstrument' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/notifications.rb:168:in instrument' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/logging.rb:25:inblock (3 levels) in ' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/logging.rb:46:in block in tag_logger' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/tagged_logging.rb:71:inblock in tagged' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/tagged_logging.rb:28:in tagged' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/tagged_logging.rb:71:intagged' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/logging.rb:46:in tag_logger' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/logging.rb:22:inblock (2 levels) in ' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:118:in instance_exec' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:118:inblock in run_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:136:in run_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/execution.rb:38:inperform_now' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/execution.rb:24:in block in execute' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:109:inblock in run_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/railtie.rb:28:in block (4 levels) in <class:Railtie>' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/execution_wrapper.rb:87:inwrap' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/reloader.rb:73:in block in wrap' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/execution_wrapper.rb:87:inwrap' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/reloader.rb:72:in wrap' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/railtie.rb:27:inblock (3 levels) in ' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:118:in instance_exec' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:118:inblock in run_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.2.2.1/lib/active_support/callbacks.rb:136:in run_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/execution.rb:22:inexecute' /app/vendor/bundle/ruby/2.5.0/gems/activejob-5.2.2.1/lib/active_job/queue_adapters/delayed_job_adapter.rb:42:in perform' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/backend/base.rb:81:inblock in invoke_job' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:61:in block in initialize' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:66:inexecute' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:40:in run_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/backend/base.rb:78:ininvoke_job' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:230:in block (2 levels) in run' /usr/lib/ruby/2.5.0/timeout.rb:93:inblock in timeout' /usr/lib/ruby/2.5.0/timeout.rb:103:in timeout' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:230:inblock in run' /usr/lib/ruby/2.5.0/benchmark.rb:308:in realtime' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:229:inrun' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:312:in block in reserve_and_run_one_job' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:61:inblock in initialize' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:66:in execute' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:40:inrun_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:312:in reserve_and_run_one_job' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:213:inblock in work_off' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:212:in times' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:212:inwork_off' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:175:in block (4 levels) in start' /usr/lib/ruby/2.5.0/benchmark.rb:308:inrealtime' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:174:in block (3 levels) in start' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:61:inblock in initialize' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:66:in execute' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:40:inrun_callbacks' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:173:in block (2 levels) in start' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:172:inloop' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/worker.rb:172:in block in start' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/plugins/clear_locks.rb:7:inblock (2 levels) in ' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:79:in block (2 levels) in add' /app/vendor/bundle/ruby/2.5.0/gems/delayed_job-4.1.5/lib/delayed/lifecycle.rb:61:inblock in initialize' /app/vendor/bundle/ruby/2.5.0/gems/de

10362227 commented 4 years ago
{
  "expected_update_period_in_days": "40",
  "url": [
    "http://electsys.sjtu.edu.cn/edu/"
  ],
  "type": "text",
  "mode": "on_change",
  "extract": {
    "link": {
      "regexp": "<a href\\=\\'(.+?htm)\\'",
      "index": "1"
    },
    "date": {
      "regexp": "date\\\"\\>(\\(.+?\\))\\<\\/font>",
      "index": "1"
    },
    "title": {
      "regexp": "title\\=\\'(.+?)\\'class\\=\\\"news\\\">",
      "index": "1"
    }
  },
  "template": {
    "title": "{{title}} {{date}}"
  }
}
dsander commented 4 years ago

Error when fetching url: Failure when receiving data from the peer

Means there is a connection problem between your Huginn server and the website you are trying to fetch.

The working status is only an indicator for the user, Huginn itself will still schedule the Agent as it normally does based on the Agent configuration. We don't have a function that immediately retries the Agent after a failure happened.

Rossil2012 commented 4 years ago
{
  "expected_update_period_in_days": "40",
  "url": [
    "http://electsys.sjtu.edu.cn/edu/"
  ],
  "type": "text",
  "mode": "on_change",
  "extract": {
    "link": {
      "regexp": "<a href\\=\\'(.+?htm)\\'",
      "index": "1"
    },
    "date": {
      "regexp": "date\\\"\\>(\\(.+?\\))\\<\\/font>",
      "index": "1"
    },
    "title": {
      "regexp": "title\\=\\'(.+?)\\'class\\=\\\"news\\\">",
      "index": "1"
    }
  },
  "template": {
    "title": "{{title}} {{date}}"
  }
}

Thank you, but I think the reason of the error is due to Internet connenction but not the extracting grammar. Dsander has answered my question.

Rossil2012 commented 4 years ago

Error when fetching url: Failure when receiving data from the peer

Means there is a connection problem between your Huginn server and the website you are trying to fetch.

The working status is only an indicator for the user, Huginn itself will still schedule the Agent as it normally does based on the Agent configuration. We don't have a function that immediately retries the Agent after a failure happened.

Thank you, I got it.