huginn / huginn

Create agents that monitor and act on your behalf. Your agents are standing by!
MIT License
42.74k stars 3.74k forks source link

WebsiteAgent stopped working after 2 days #2831

Open kenshinjeff opened 4 years ago

kenshinjeff commented 4 years ago

Hi there, I'm kind of stuck and I was hoping you could point me in the correct direction. I'm trying to scrape data and transfer the events to an RSS feed. I think I got the xpath correct, it was working for the first 2 days. But somehow it says the agent is down and there's no new events. and it does show the new data when I do a dry run.

{
  "expected_update_period_in_days": "1",
  "url": "https://mothership.sg/category/news/",
  "type": "html",
  "mode": "on_change",
  "extract": {
    "link": {
      "xpath": "//*[@id=\"latest-news\"]/*[contains(@class, \"ind-article\")]/a/@href",
      "value": "."
    },
    "image": {
      "xpath": "//*[@id=\"latest-news\"]/div/a/div/div[1]/@style",
      "value": "substring-before(substring-after(., \"url('\" ), \"')\")"
    },
    "title": {
      "xpath": "//*[@id=\"latest-news\"]/div/a/div/div[2]/h1",
      "value": "normalize-space(.)"
    },
    "description": {
      "xpath": "//*[@id=\"latest-news\"]/div/a/div/div[2]/h1/text()",
      "value": "."
    },
    "pubDate": {
      "xpath": "//*[@id=\"latest-news\"]/*[contains(@class, \"ind-article\")]/a/div/div/div/p/span/text()",
      "value": "."
    }
  }
}

s1

kenshinjeff commented 4 years ago

Ok I think I know what's the issue at hand but I'm not sure how to fix it. Basically I was using a docker container and then I changed the date on the docker container.

What I should have done was:

docker run -it -p 3000:3000 \
-v /opt/docker-huggin-db/:/var/lib/mysql \
-v /etc/timezone:/etc/timezone:ro \
-v /etc/localtime:/etc/localtime:ro \
--env-file /home/myuser/.env_huginn \
--name=huginn \
huginn/huginn

And TIMEZONE="Asia/Singapore" for the .env file but I realise that if even the system shows the correct time, huggin uses the correct time (mouseover the time on a "background job"), the AgentRunScheduleJob won't start. I think it may not be working correctly somehow.

If I comment out #TIMEZONE="Asia/Singapore", it will show this in the log:

foreman stdout | 23:28:39 web.1  | 192.168.1.47 - - [09/Jul/2020:08:28:39 PDT] "GET /worker_status?since_id=2 HTTP/1.1" 200 123
foreman stdout | 23:28:39 web.1  | http://192.168.1.253:3000/agents/2/events?return=%2Fagents%2F2%3Freturn%3D%252Fagents -> /worker_status?since_id=2
foreman stdout | 23:28:42 web.1  | 192.168.1.47 - - [09/Jul/2020:08:28:42 PDT] "GET /worker_status?since_id=2 HTTP/1.1" 200 124

It's kind of working if I do it this way, but not what I was expecting.

dsander commented 4 years ago

And TIMEZONE="Asia/Singapore" for the .env file but I realise that if even the system shows the correct time, huggin uses the correct time (mouseover the time on a "background job"), the AgentRunScheduleJob won't start. I think it may not be working correctly somehow.

Hmm not sure that I follow, scheduled jobs never run or do they run at the wrong time?

If I comment out #TIMEZONE="Asia/Singapore", it will show this in the log:

That log output looks odd indeed, normally the since argument should be a UNIX epoch, and I doubt the last time the call was made was in 1970 😄

kenshinjeff commented 4 years ago

Hi there, thanks for responding! To clarify, when TIMEZONE="Asia/Singapore", scheduled jobs never run.

fiercedruid commented 4 years ago

I have set TIMEZONE="Singapore" and it works well so far...

kenshinjeff commented 4 years ago

I have set TIMEZONE="Singapore" and it works well so far...

Why does this work? Is it a ruby specific environment thing?

dsander commented 4 years ago

I think it's Rails specific, this lists the timezone names

Here is the output of rake time:zones:all * UTC -12:00 * International Date Line West * UTC -11:00 * American Samoa Midway Island * UTC -10:00 * Hawaii * UTC -09:00 * Alaska * UTC -08:00 * Pacific Time (US & Canada) Tijuana * UTC -07:00 * Arizona Chihuahua Mazatlan Mountain Time (US & Canada) * UTC -06:00 * Central America Central Time (US & Canada) Guadalajara Mexico City Monterrey Saskatchewan * UTC -05:00 * Bogota Eastern Time (US & Canada) Indiana (East) Lima Quito * UTC -04:00 * Atlantic Time (Canada) Caracas Georgetown La Paz Puerto Rico Santiago * UTC -03:30 * Newfoundland * UTC -03:00 * Brasilia Buenos Aires Greenland Montevideo * UTC -02:00 * Mid-Atlantic * UTC -01:00 * Azores Cape Verde Is. * UTC +00:00 * Edinburgh Lisbon London Monrovia UTC * UTC +01:00 * Amsterdam Belgrade Berlin Bern Bratislava Brussels Budapest Casablanca Copenhagen Dublin Ljubljana Madrid Paris Prague Rome Sarajevo Skopje Stockholm Vienna Warsaw West Central Africa Zagreb Zurich * UTC +02:00 * Athens Bucharest Cairo Harare Helsinki Jerusalem Kaliningrad Kyiv Pretoria Riga Sofia Tallinn Vilnius * UTC +03:00 * Baghdad Istanbul Kuwait Minsk Moscow Nairobi Riyadh St. Petersburg * UTC +03:30 * Tehran * UTC +04:00 * Abu Dhabi Baku Muscat Samara Tbilisi Volgograd Yerevan * UTC +04:30 * Kabul * UTC +05:00 * Ekaterinburg Islamabad Karachi Tashkent * UTC +05:30 * Chennai Kolkata Mumbai New Delhi Sri Jayawardenepura * UTC +05:45 * Kathmandu * UTC +06:00 * Almaty Astana Dhaka Urumqi * UTC +06:30 * Rangoon * UTC +07:00 * Bangkok Hanoi Jakarta Krasnoyarsk Novosibirsk * UTC +08:00 * Beijing Chongqing Hong Kong Irkutsk Kuala Lumpur Perth Singapore Taipei Ulaanbaatar * UTC +09:00 * Osaka Sapporo Seoul Tokyo Yakutsk * UTC +09:30 * Adelaide Darwin * UTC +10:00 * Brisbane Canberra Guam Hobart Melbourne Port Moresby Sydney Vladivostok * UTC +11:00 * Magadan New Caledonia Solomon Is. Srednekolymsk * UTC +12:00 * Auckland Fiji Kamchatka Marshall Is. Wellington * UTC +12:45 * Chatham Is. * UTC +13:00 * Nuku'alofa Samoa Tokelau Is.