"getaddrinfo: Name or service not known" until I restart the daemon

hypevhs commented 7 years ago

This issue has similar symptoms to but not the same cause as #32. After leaving the daemon alone for a few days, I come back to find that it gets stuck on something, and new items are not added. The most recent logs are filled with things like this

Feb 26 19:05:44 transmission-rss[565]: 1488153944(debug) aggregate <SOME_RSS_URL>
Feb 26 19:05:44 transmission-rss[565]: 1488153944(debug) retrieval error (SocketError: Failed to open TCP connection to <SOME_RSS_DOMAIN>:443 (getaddrinfo: Name or service not known))

Which means the DNS address lookup failed somehow. This continues forever until I restart the daemon. There are two reasons why I think transmission-rss is at fault for this behavior:

dig <THAT_SAME_DOMAIN> gives good results meanwhile.
Upon systemctl restarting the daemon, I IMMEDIATELY see backed-up items come flooding in.

Feb 26 19:11:21 transmission-rss[12044]: 1488154281(debug) aggregate <SOME_RSS_URL>
Feb 26 19:11:22 transmission-rss[12044]: 1488154282(debug) on_new_item event <SOME_RSS_ITEM>

Other notes:

ruby -v is ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
transmission-rss -v is 0.2.2
My config file has 8 urls and transmission-daemon login creds, nothing else of interest.

hypevhs commented 7 years ago

I am not a Ruby developer but some research says that Ruby's resolving can act strange. For the time being, I have added require 'resolv-replace' to aggregator.rb and we'll see what happens.

hypevhs commented 7 years ago

Adding resolv-replace may or may not have fixed the aggregator. But, on my machine, localhost no longer resolves when using resolv-replace, so connections to the daemon fail. To fix this, I have to add a server section to my config.yml to use 127.0.0.1 instead.

server:
  host: 127.0.0.1
  port: 9091
  rpc_path: /transmission/rpc

I should note that config.rb was NOT listening to my changes and I had to edit the initialize() method to access the Hash values properly.

Before:

      @host     = server[:host] || 'localhost'
      #etc for the other keys...

After:

      @host     = server["host"] || 'localhost'
      #etc for the other keys...

This would be way too trivial a mistake for this to not actually be a misunderstanding on my part. Again, I'm not a Ruby developer!

nning commented 7 years ago

Thanks for reporting this!

Please update to 0.2.4 (the resolving error should persist).

Which distro are you running? Do you have any special DNS config or something else you can think of that can have something to do with this problem?

I can't reproduce this issue; could you please further evaluate whether resolv-replace is a solution?

hypevhs commented 7 years ago

I'm running the latest Manjaro linux. I'm not sure if DNS is anything special. I've left it running and it seems to be fine, but I would consider resolv-replace to be the fix if the symptoms never come up in 4 days time.

hypevhs commented 7 years ago

The symptoms are haven't shown up in 4 days with resolv-replace added. If you want, I can remove it again and test 4 more days, then see if the symptoms come back.

nning commented 7 years ago

Ok, thanks for testing! Please test again if symptoms come back. I also wanted to check by myself in a VM but did not have the time, yet.

But I think, we could just include resolv-replace if symptoms return without it.

nning commented 7 years ago

Did symptoms return?

hypevhs commented 7 years ago

No, and that worried me. So I looked through systemd logs again, and I may have found a precondition that leads to the symptoms (although I wouldn't call it a "cause" yet). In other words, it may not be time-based like I assumed. I'm still testing.

nning commented 7 years ago

Did the precondition turn out as a cause?

hypevhs commented 7 years ago

The problem is a lot different what what I'd originally thought. I've misled you. Sorry!

If transmission-rss is started while the system doesn't have internet access (in other words, domains don't resolve), then no matter how many times you disconnect/reconnect eth/wlan, that process will be unable to resolve any domains until you properly restart it.

I've tested this in irb too. If a Ruby process tries its first domain resolve, and it fails, it becomes permanently unable to resolve no matter what, will also fail. I've tested with Resolve::DNS, open-uri, and even Python3 urllib.request, a whole other language! This behavior is not on Windows Cygwin irb either.

So this is sounding more like a Manjaro/NetworkManager problem. At this point my only request is that your example systemd service perhaps replace After=network.target with After=network-online.target so that the connection is guaranteed fully up before it starts.

nning commented 7 years ago

I changed the unit file to depend on network-online.target (and to pull that one in). Thank you very much for your investigation and the fix!

(Feel free to re-open if the issue somehow persists!)

nning / transmission-rss

"getaddrinfo: Name or service not known" until I restart the daemon #42