rand256 / valetudo

Valetudo RE - experimental vacuum software, cloud free
Apache License 2.0
671 stars 74 forks source link

Map not shown ("no map data") #139

Closed TheAssassin closed 4 years ago

TheAssassin commented 4 years ago

On my Gen1 which I received recently, I cannot get the maps working. Controlling the robot works pretty much as intended (although I don't really understand the manual control mode), but no matter how often I let it clean, I never get any map. I've set up a small <1m² test parkour for the bot.

Due to the lack of logging, which is the only easy way to assess Valetudo's behavior, I cannot tell what's going wrong. I've tried the builds from the release page (0.8.2 and 0.9.0), a Dustbuilder image and even one with the original Valetudo (which didn't work at all).

To me it seems like there is map data (last_map for instance is not an empty file), but Valetudo won't receive it via the dustcloud fake service (at least I think Valetudo doesn't read the files directly but receives something via the socket on 808x). That's also the reason for the map files not being copied to userX. I'm not sure whether the map should've been copied to /mnt/data/valetudo/last_map, the code isn't completely clear on that.

In any case, I'd like to get into debugging this problem, really, but for someone like me who rather NOT touches node.js, some logging would be nice to have. Perhaps you can tell me where to start hacking?

P.S.: I can unfortunately not tell more information about what firmware it has after the factory reset, as mirobo ... info yields the infamous "vacuum not connected to cloud" error message.

rand256 commented 4 years ago

Valetudo doesn't read map files from filesystem nor copies it (except for manual map store function), current map should be uploaded to valetudo's, when firmware receives from dustcloud via miio the "correct" URLs where it should upload it.

You may want to check netstat -a from SSH, where you should see the line like this:

udp 0 0 <robot's internal IP>:<random port> 203.0.113.1:8053 ESTABLISHED

If you see this, with exactly 203.0.113.1:8053, and you have iptables rules in /etc/rc.local that DNATs 203.0.113.1 traffic back to 127.0.0.1 (and they were actually run), you should have the maps. Opening map tab on the web interface will force the firmware to reupload the map (if it is connected to dustcloud).

Sometimes manufacturer's firmware doesn't reconnect to dustcloud immediately, so it can be connected nowhere, nor to xiaomi cloud neither to dustcloud. In this case there's no way to force it to reconnect, it may take up to half an hour for it to do that. And if it happens that firmware managed to connect to the real cloud (generally it should never happen), it will keep that connection as long as it can. Then the simplest way is to reboot, or close internet access for the device.

TheAssassin commented 4 years ago

Service hint: you may just call cat /proc/net/ip_conntrack, that's faster and more reliable than netstat. Also, I'd recommend using netstat -na, because it doesn't attempt pointless name lookups.

I just reflashed 0.9.0 after breaking things while playing around. I can't see such a connection yet. iptables and the hosts file look fine, already checked that. Cleaning my test parkour just finished, I'll let it run for now to see whether it'll ever connect. If it won't connect within the next 30 minutes, I will consider rebooting.

The bot is in a separate VLAN which doesn't permit any access to the Internet.

By the way, is this widget intended to show those values?

screenshot_2020-02-25_10-03-53

TheAssassin commented 4 years ago

Bot's log so far:

# tail -f /var/log/upstart/valetudo.log 
Waiting for 30 sec after boot... done.
2020-02-25T08:13:21.743Z Loading configuration file: /mnt/data/valetudo/config.json
2020-02-25T08:13:21.893Z Dummycloud is spoofing 203.0.113.1:8053 on 127.0.0.1:8053
2020-02-25T08:13:21.895Z Webserver running on port 80
2020-02-25T08:13:26.844Z Got token from handshake: xyz
2020-02-25T08:13:26.870Z Probed last id = 1001 using get_status (2 retries)
2020-02-25T08:30:27.390Z Got token from handshake: xyz
TheAssassin commented 4 years ago

I've just checked again, same logs, no change. The HTTP request to poll_map yields {"message": "ok"}',/api/map/latest` yields an empty response. The bot hasn't connected to the dustcloud yet.

I'm going to reboot the device now.

Is the map the only thing that's being transferred over miio? If yes, it might make sense to try to reverse that format.

rand256 commented 4 years ago

is this widget intended to show those values?

It shows exactly these values when roborock software isn't connected to the dummycloud.

Is the map the only thing that's being transferred over miio?

The map is transferred via HTTP PUT, and map upload destination is set by dummycloud using a response to miio gen_presigned_url request. For this to work the device must be connected to dummycloud, and seems something prevents it on your side.

Which base firmware do you use?

TheAssassin commented 4 years ago

I've been using the image you provide in your release section, so it's firmware 4004 resp. whatever 0.8.2 is built on. I can try building my own firmware, too.

I've rebooted through SSH by the way, and now for some reason current_status yields 500 responses after some timeout.

It shows exactly these values when roborock software isn't connected to the dummycloud.

I think we should start documenting these diagnostic hints. How about a troubleshooting wiki page?

Edit: a request to settings.html took about 90 seconds to be replied by a 200 response (which looks valid), current_status is yet to be replied to.

Edit 2:

> time curl http://bot/api/current_status
Unable to reach vacuum, no response for message0.01user 0.01system 0:24.76elapsed 0%CPU (0avgtext+0avgdata 10644maxresident)k
0inputs+0outputs (0major+559minor)pagefaults 0swaps
pidator commented 4 years ago

hm, @rand256, I think I've found a possible reason for that error: I just flashed your 4004 image of 0.9.0 for Gen1 and for me, it seems the modified /etc/hosts is missing!! Here's the output of a newly flashed Gen1 with the vacuum_valetudo_re_4004.pkg:

Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.4.39 armv7l)

 * Documentation:  https://help.ubuntu.com/
Last login: Tue Feb 25 19:36:50 2020 from ****
 _     _   _____    ______  _     _  _     _   _____     _   _ 
| |   | | / ___ \  / ____ \| |   | || |   | | / _ _ \   | \ |  
| |   | || /   \ || /    \/| |   | || |   | || /| |\ |  |_/ |_ 
| |   | || \___/ || |      | |   | || |   | || || || |  |\  |  
| |   | ||  ___  || |      | |   | || |   | || || || |  | | |_ 
 \ \_/ / | |   | || |      | |   | || |   | || || || |         
  \   /  | |   | || \____/\| \___/ || \___/ || || || |         
   \_/   |_|   |_| \______/ \_____/  \_____/ |_||_||_|         
                                                       20200217
===============================================================
MODEL...........: rockrobo.vacuum.v1
SERIAL..........: ***
PRODUCTION DATE.: October 2010
FIRMWARE........: 3.5.4_004004
BUILD NUMBER....: 2019090500REL
REGION..........: de
IP..............: ***.***.***.***
MAC.............: ****
TOKEN...........: ****
DID.............: ***
KEY.............: ***
===============================================================

root@rockrobo:~# cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       rockrobo

::1             localhost ip6-localhost ip6-loopback
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters

So the traffic of the robot isn't directed to the dustcloud (203.0.113.1) and so you can't see an connection in netstat...

TheAssassin commented 4 years ago

@pidator check /etc/rc.local, it sets up some "cloud dns" thingy that is most likely some dnsmasq that enforces the redirection.

TheAssassin commented 4 years ago

This entire iptables hackery looks bogus to me. I can't even connect to port 8053 using netcat.

I'll try another approach, using a route back to lo for the required IP range.

zvldz commented 4 years ago

This entire iptables hackery looks bogus to me. I can't even connect to port 8053 using netcat.

I'll try another approach, using a route back to lo for the required IP range.

What are you trying to do?!

TheAssassin commented 4 years ago

I'm proposing a better way to implement the "fake dustcloud" networking, which also allows for some debugging.

root@rockrobo:~# ip addr add 203.0.113.1/32 dev lo
root@rockrobo:~# ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet 203.0.113.1/32 scope global lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
root@rockrobo:~# ping -c4 203.0.113.1
PING 203.0.113.1 (203.0.113.1) 56(84) bytes of data.
64 bytes from 203.0.113.1: icmp_seq=1 ttl=64 time=0.217 ms
64 bytes from 203.0.113.1: icmp_seq=2 ttl=64 time=0.208 ms
64 bytes from 203.0.113.1: icmp_seq=3 ttl=64 time=0.168 ms
64 bytes from 203.0.113.1: icmp_seq=4 ttl=64 time=0.171 ms

--- 203.0.113.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.168/0.191/0.217/0.021 ms

Now, we have the address bound to the loopback interface. The kernel will redirect all traffic for the IP 203.0.113.1 back to the device, that's ensured. In contrary to the previous solution, we don't need any complicated NAT stuff. A single command replaces three iptables commands.

I only have to change the valetudo code to listen on said IP instead of 127.0.0.1, and I can test the dustcloud service.

root@rockrobo:~# netstat -tulpen | grep valetudo
tcp6       0      0 :::80                   :::*                    LISTEN      0          7397        753/valetudo    
udp        0      0 0.0.0.0:42063           0.0.0.0:*                           0          7399        753/valetudo    
udp        0      0 127.0.0.1:8053          0.0.0.0:*                           0          7398        753/valetudo   

I only need to figure out how to run valetudo without having to rebuild it.

Edit: oh, nice, I can just change the listen IP in the config. I hate vi though (always do something wrong), I'll edit the file offline.

rand256 commented 4 years ago

@pidator, do you have issues with a map on 4004 image?

TheAssassin commented 4 years ago

It looks like the bot finally connected to the dustcloud UDP socket, even if just for a few seconds:

udp      17 17 src=203.0.113.1 dst=127.0.0.1 sport=8053 dport=1 [UNREPLIED] src=127.0.0.1 dst=203.0.113.1 sport=1 dport=8053 mark=0 use=2

Not entirely sure what happened, but I saw a greatly difference map view, completely grey background. Then it went back to its original state again. To me that kind of shows that the issue is a networking one and probably related to the iptables rule set. I'll keep digging.

pidator commented 4 years ago

just checked again: at the beginning it seems to be the same errors @TheAssassin has (no map, status connecting and battery 0%) and additionally this error when opening the map tab:

map_error

then I've added the hosts entries from deployment section and did a reboot. Now after ~2h of waiting everything seems to work quite normal: now the robot is creating a new map while cleaning, the status block is up to date, no more problems with my Gen1. And the host file was the only thing I've changed.

rand256 commented 4 years ago

So actually cloud-dnsmasq isn't working there.

TheAssassin commented 4 years ago

@pidator I can try that later, too. But I'm using a self built image right now, based on the older version which comes with a complete /etc/hosts.

zvldz commented 4 years ago

@pidator and @TheAssassin Can you check my firmware version ? I don't have the first generation.

TheAssassin commented 4 years ago

Hm... now it seems DNS doesn't work, causing issues with the bot's miio client:

udp      17 19 src=127.0.0.1 dst=127.0.1.1 sport=56026 dport=53 [UNREPLIED] src=127.0.1.1 dst=127.0.0.1 sport=53 dport=56026 mark=0 use=2
udp      17 19 src=127.0.0.1 dst=127.0.1.1 sport=43278 dport=53 [UNREPLIED] src=127.0.1.1 dst=127.0.0.1 sport=53 dport=43278 mark=0 use=2
udp      17 19 src=127.0.0.1 dst=127.0.1.1 sport=52083 dport=53 [UNREPLIED] src=127.0.1.1 dst=127.0.0.1 sport=53 dport=52083 mark=0 use=2
udp      17 19 src=127.0.0.1 dst=127.0.1.1 sport=42932 dport=53 [UNREPLIED] src=127.0.1.1 dst=127.0.0.1 sport=53 dport=42932 mark=0 use=2
zvldz commented 4 years ago

It is better to use drill to check dns.

zvldz commented 4 years ago

Then you'll have to deal with it yourself.

TheAssassin commented 4 years ago

@zvldz I'm watching active connections on the bot to see whether there's connections to the DNS service or the UDP socket valetudo opens. Your "tip" doesn't help at all. I do not want to check whether the DNS service works, because that is already known.

The problem here is that the bot's miio client is either not using either service or maybe is unable to connect to it. I'm using the Linux kernel's connection tracking feature called conntrack.

udp      17 4 src=203.0.113.1 dst=127.0.0.1 sport=8053 dport=1 [UNREPLIED] src=127.0.0.1 dst=203.0.113.1 sport=1 dport=8053 mark=0 use=2

This basically means there's a connection request from the bot to the valetudo socket, but it hasn't received a reply yet. As this is a UDP connection, that doesn't have to mean that the connection isn't possible.

zvldz commented 4 years ago

Good luck. Check if dnsmasq works.

pidator commented 4 years ago

Can you check my firmware version ? I don't have the first generation.

just finished a full cleaning with 4004 of valetudo re version and rebuild all my zones and spots. But I think flashing of your version should kept my data, right? I see there's a 4007 on your site too, do you have any change notes?

So actually cloud-dnsmasq isn't working there.

Is there an easy way to figure out if dnsmasq is working correctly without the host entries @rand256 ? The command drill supposed by @zvldz isn't available on my Gen1...

TheAssassin commented 4 years ago

I reset my environment to use the upstream 4004 image, as it contains all the valuable debug tools such as tcpdump. The map polling is sent to destination port 1. That's obviously wrong, nothing listens on that port. Therefore the poll request won't be answered, and the bot cannot display the map.

You can see the behavior in the line I posted earlier while still using my other image, see https://github.com/rand256/valetudo/issues/139#issuecomment-591072390. I'm attaching a pcap that contains those broken packets, so you can see it yourself (recorded with tcpdump on the official release image).

Port 1 is, for some reason, the default value Valetudo uses for communicating with the bot:

https://github.com/rand256/valetudo/blob/4d08b9b3d498dc460bf934251d21c2c8dfda2d60/lib/miio/Dummycloud.js#L32-L39

Looking a few lines later, this behavior can be explained as follows. The dustcloud API has never seen a connection from the bot and therefore hasn't set those values.

Now, this of course doesn't fix my issue, but it confirms @rand256's explanation in https://github.com/rand256/valetudo/issues/139#issuecomment-590861785.

Does anyone have an idea how to force the bot to connect to the dustcloud API? Can we restart single services perhaps?

broken-map-poll.pcap.zip

rand256 commented 4 years ago

Does anyone have an idea how to force the bot to connect to the dustcloud API? Can we restart single services perhaps?

You may call restart rrwarchdoge for that, it'll restart all the services.

As I understand, the only issue with my 4004 firmware image is that so-called cloud-dnsmasq service works improperly. Its only task is to answer with 203.0.113.1 to all cloud domains requests (mi.com, xiaomi.com) and forward all other dns requests. Maybe it simply can't start by some reason. Because of that the bot never connects to dustcloud.

The dnsmasq is launched with upstart script /etc/init/dnsmasq.conf and logs to /var/log/messages. I have no upstart on 2008 firmware and can't look at it now. Maybe start on should be changed to (started networking) or something.

Is there an easy way to figure out if dnsmasq is working correctly

That would be ps | grep cloud-dnsmasq for checking if it's running and nslookup asdf.mi.com for checking if it's actually doing its job.

pidator commented 4 years ago

The dnsmasq is launched with upstart script /etc/init/dnsmasq.conf and logs to /var/log/messages.

no messages-file in /var/log on my Gen1. But this is the log of /var/log/upstart/dnsmsaq.log

root@rockrobo:/var/log/upstart# more dnsmasq.log

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 5354: Address already in use
root@rockrobo:/var/log/upstart#

dnsmasq is not running:

root@rockrobo:/# ps | grep cloud-dnsmasq
root@rockrobo:/#

nslookup isn't available neither:

root@rockrobo:/# nslookup asdf.mi.com
-bash: nslookup: command not found
zvldz commented 4 years ago

just finished a full cleaning with 4004 of valetudo re version and rebuild all my zones and spots. But I think flashing of your version should kept my data, right? I see there's a 4007 on your site too, do you have any change notes?

No, unfortunately I don't have a list of changes for firmware 4007. The data must not be lost during the firmware update.

Is there an easy way to figure out if dnsmasq is working correctly without the host entries @rand256 ? The command drill supposed by @zvldz isn't available on my Gen1.

Drill is installed in my firmware or can be installed via apt install ldnsutils

rand256 commented 4 years ago

@pidator Sorry for a bit misguiding, as you see firmware with stripped down ubuntu is different to older ones. Anyway, now we see the reason: something's using that port in those "full-featured" images of gen1.

So if you simply change 5354 to something else i.e. 55354 in /etc/rc.local and /etc/init/dnsmasq.conf and reboot, it most likely will be fixed.

pidator commented 4 years ago

So if you simply change 5354 to something else i.e. 55354 in /etc/rc.local and /etc/init/dnsmasq.conf and reboot, it most likely will be fixed.

unfortunately not:

root@rockrobo:~# tail -F /var/log/upstart/dnsmasq.log

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 5354: Address already in use

dnsmasq: failed to create listening socket for port 55454: Address already in use
^C
root@rockrobo:~# ps | grep cloud-dnsmasq
root@rockrobo:~#
pidator commented 4 years ago

till now I've found no working port on a Gen1. Tried 5354, 55454, 53, and from @zvldz version 55553. The result is still the same, dnsmasq couldn't start:

root@rockrobo:~# tail -F /var/log/upstart/dnsmasq.log
dnsmasq: failed to create listening socket for port 5354: Address already in use
dnsmasq: failed to create listening socket for port 55454: Address already in use
dnsmasq: failed to create listening socket for port 55454: Address already in use
dnsmasq: failed to create listening socket for port 53: Address already in use
dnsmasq: failed to create listening socket for port 55553: Address already in use

@pidator I can try that later, too. But I'm using a self built image right now, based on the older version which comes with a complete /etc/hosts.

@TheAssassin Could you verify a working version by adding only the host entries to the 4004 image of @rand256 on your Gen1?

zvldz commented 4 years ago

But this is the log of /var/log/upstart/dnsmsaq.log

It seems that dnsmasq is trying to run several times.

pidator commented 4 years ago

a new entry is only created after reboot!

rand256 commented 4 years ago

It seems that dnsmasq is trying to run several times.

Yeah, but why then the first instance fails after taking the specified port? And doesn't release it quick enough. Weird all of this.

Could you verify a working version by adding only the host entries to the 4004 image

Most likely this will be enough for 4004 image with downgraded miio client to 3.3.9. But newer miio clients doesn't care of hosts file at all, so dnsmasq workaround was introduced exactly because of that. If only it ran correctly, as it does on 2008 firmware.

@pidator , could you please try a couple other experiments? Edit /etc/init/dnsmasq.conf and either add to the end of exec line --bind-dynamic, or change start on value to net-device-up IFACE=lo, then reboot and check whether dnsmasq is running. But that's just a guess. Maybe @TheAssassin will come up with a proper solution.

pidator commented 4 years ago

Edit /etc/init/dnsmasq.conf and either add to the end of exec line --bind-dynamic

no cloud-dnsmasq process visible but also no new entry in dnsmasq.log! so it seems nothing happens.

or change start on value to net-device-up IFACE=lo, then reboot and check whether dnsmasq is running.

no cloud-dnsmasq process visible and again the new entry of failed to create listening socket for the configured port in dnsmasq.log.

zvldz commented 4 years ago

On firmware 1898 I have messages 'Address already in use' too. But dnsmasq works well at the same time. Perhaps there is something wrong with the startup script.

zvldz commented 4 years ago

Try to comment/delete the line 'expect fork' in the file /etc/init/dnsmasq.conf. And add -k to the exec line. Then reboot.

pidator commented 4 years ago

On firmware 1898 I have messages 'Address already in use' too. But dnsmasq works well at the same time.

that made me wonder. so here are the steps I've just done:

root@rockrobo:~# cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       rockrobo

::1             localhost ip6-localhost ip6-loopback
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters

dnsmasq: failed to create listening socket for port 5355: Address already in use

root@rockrobo:~# ps | grep cloud-dnsmasq
root@rockrobo:~# ps
  PID TTY          TIME CMD
 1573 pts/0    00:00:00 bash
15718 pts/0    00:00:00 ps

So, for me it seems dnsmasq was still running all of the time on my Gen1 correctly but I havn't noticed because ps | grep cloud-dnsmasq doesn't display anything.

I now can't explain my beginning issue after flashing 4004...

zvldz commented 4 years ago

Could you show the result of the commands

  1. netstat -anop | grep -E "tcp|udp"
  2. iptables -S -t nat
pidator commented 4 years ago
root@rockrobo:~# netstat -anop | grep -E "tcp|udp"
tcp        0      0 127.0.0.1:5037          0.0.0.0:*               LISTEN      426/adbd         off (0.00/0/0)
tcp        0      0 127.0.0.1:54322         0.0.0.0:*               LISTEN      941/miio_client  off (0.00/0/0)
tcp        0      0 127.0.0.1:54323         0.0.0.0:*               LISTEN      941/miio_client  off (0.00/0/0)
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1470/sshd        off (0.00/0/0)
tcp        0      0 0.0.0.0:199             0.0.0.0:*               LISTEN      402/snmpd        off (0.00/0/0)
tcp        0      0 0.0.0.0:6665            0.0.0.0:*               LISTEN      866/player       off (0.00/0/0)
tcp        0      0 0.0.0.0:5355            0.0.0.0:*               LISTEN      498/cloud-dnsmasq off (0.00/0/0)
tcp        1      0 127.0.0.1:38271         127.0.0.1:80            CLOSE_WAIT  933/AppProxy     off (0.00/0/0)
tcp        0      0 127.0.0.1:54322         127.0.0.1:44335         ESTABLISHED 941/miio_client  off (0.00/0/0)
tcp        0     36 <LOCAL ROBOT LAN IP>:22       <LOCAL PC LAN IP>:50797    ESTABLISHED 31356/0          on (0.50/0/0)
tcp        0      0 127.0.0.1:44581         127.0.0.1:54322         ESTABLISHED 25553/miio_recv_lin off (0.00/0/0)
tcp        0      0 127.0.0.1:54322         127.0.0.1:44581         ESTABLISHED 941/miio_client  off (0.00/0/0)
tcp        0      0 127.0.0.1:44335         127.0.0.1:54322         ESTABLISHED 933/AppProxy     off (0.00/0/0)
tcp6       0      0 :::80                   :::*                    LISTEN      741/valetudo     off (0.00/0/0)
tcp6       0      0 :::22                   :::*                    LISTEN      1470/sshd        off (0.00/0/0)
tcp6       0      0 :::5355                 :::*                    LISTEN      498/cloud-dnsmasq off (0.00/0/0)
tcp6       0      0 127.0.0.1:80            127.0.0.1:38271         FIN_WAIT2   -                timewait (43.08/0/0                                                                                                                         )
tcp6       0      0 127.0.0.1:80            127.0.0.1:38269         TIME_WAIT   -                timewait (38.02/0/0                                                                                                                         )
tcp6       0      0 127.0.0.1:80            127.0.0.1:38263         TIME_WAIT   -                timewait (27.88/0/0                                                                                                                         )
udp        0      0 0.0.0.0:55130           0.0.0.0:*                           741/valetudo     off (0.00/0/0)
udp        0      0 127.0.0.1:8053          0.0.0.0:*                           741/valetudo     off (0.00/0/0)
udp        0      0 <LOCAL ROBOT LAN IP>:33667    203.0.113.1:8053        ESTABLISHED 941/miio_client  off (0.00/0/0)
udp        0      0 0.0.0.0:161             0.0.0.0:*                           402/snmpd        off (0.00/0/0)
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           941/miio_client  off (0.00/0/0)
udp        0      0 0.0.0.0:5355            0.0.0.0:*                           498/cloud-dnsmasq off (0.00/0/0)
udp        0      0 0.0.0.0:6665            0.0.0.0:*                           866/player       off (0.00/0/0)
udp        0      0 0.0.0.0:54321           0.0.0.0:*                           941/miio_client  off (0.00/0/0)
udp        0      0 0.0.0.0:48186           0.0.0.0:*                           1738/dhclient    off (0.00/0/0)
udp        0      0 0.0.0.0:68              0.0.0.0:*                           1738/dhclient    off (0.00/0/0)
udp6       0      0 :::58531                :::*                                1738/dhclient    off (0.00/0/0)
udp6       0      0 :::5355                 :::*                                498/cloud-dnsmasq off (0.00/0/0)
root@rockrobo:~# iptables -S -t nat
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-A OUTPUT -d 203.0.113.1/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 127.0.0.1:8053
-A OUTPUT -d 203.0.113.1/32 -p udp -m udp --dport 8053 -j DNAT --to-destination 127.0.0.1:8053
-A OUTPUT -p udp -m owner ! --uid-owner 65534 -m udp --dport 53 -j DNAT --to-destination 127.0.0.1:5355
-A OUTPUT -p tcp -m owner ! --uid-owner 65534 -m tcp --dport 53 -j DNAT --to-destination 127.0.0.1:5355
root@rockrobo:~#
zvldz commented 4 years ago

cloud-dnsmasq working.

ps | grep cloud-dnsmasq - incorrect command

right command ps aux | grep cloud-dnsmasq

TheAssassin commented 4 years ago

cloud-dnsmasq has been running fine, though it has been conflicting with the bot's regular dnsmasq. That's an issue, as the bot won't be able to provide DHCP if you reset the WiFi. There should be a separate init script, the init script of the regular dnsmasq should really not be misused for that.

@pidator adding the host entries makes no difference. I don't know why (yet).

It's really hard to debug these problems. Is there any chance we "guess" the port the bot will later use to connect to the API? I don't think we have to wait for it to connect, do we? Can I run valetudo directly from the scripts somehow? Or do I have to put a node runtime on the bot manually? That way I can add logging where needed and override some default ports and such stuff.

@rand256 I've rebooted using reboot, as restart rrwarchdoge showed me there's no service with that name. Now valetudo apparently can't connect to the bot any more, API calls to current_status time out, yielding the "unable to connect" error. I guess I have to re-flash now, unless someone knows a way how to reset the configs...

Regarding my changes in the networking of my image, they were working just fine and are a lot less complex to use. I'll provide details in a separate issues.

TheAssassin commented 4 years ago

Regarding the init file, you can get rid of expect fork by simply running dnsmasq in foreground (i.e., adding --no-daemon resp. -d. The management through the init script doesn't work at all right now, I always have to killall before I could restart the daemon properly through upstart. Running with -d fixes that problem. This way, we also see logs of the daemon in the file. But, as said before, this service should be managed by a separate script.

zvldz commented 4 years ago

Regarding the init file, you can get rid of expect fork by simply running dnsmasq in foreground (i.e., adding --no-daemon resp. -d.

This is debug mode. -k more suitable

-d, --no-daemon Debug mode: don't fork to the background, don't write a pid file, don't change user id, generate a complete cache dump on receipt on SIGUSR1, log to stderr as well as syslog, don't fork new processes to handle TCP queries. Note that this option is for use in debugging only, to stop dnsmasq daemonising in production, use -k.

rand256 commented 4 years ago

though it has been conflicting with the bot's regular dnsmasq. That's an issue, as the bot won't be able to provide DHCP if you reset the WiFi. There should be a separate init script

It is already a separate init script, isn't it? Regular dnsmasq is started from /opt/rockrobo/wlan/wifi_start.sh.

It's really hard to debug these problems. Is there any chance we "guess" the port the bot will later use to connect to the API? I don't think we have to wait for it to connect, do we?

I don't really know what's going on in these gen1, I've never had a signle issue on gen2 with miio client connecting to UDP 8053 at dustcloud. And yes, we do have to wait till it decides to connect. It never was that long anyway.

showed me there's no service with that name

Since the name is rrwatchdoge, that was an obvious typo. All init scripts can be easily found in /etc/init.

TheAssassin commented 4 years ago

Regular dnsmasq is started from /opt/rockrobo/wlan/wifi_start.sh.

Thanks. That script is run when WiFi is reset, I assume. I don't see any calls dnsmasq, but it'll work somehow, I guess. I think we should rename the init script, though. It put me on a wrong trail. I can send a PR for that, but I don't see where it's added in this repo. I'll check @zvldz's repo later.

Since the name is rrwatchdoge, that was an obvious typo. All init scripts can be easily found in /etc/init.

Right, my bad. Sorry for the noise. I'll try that and hopefully things work again. If not, I'll reflash once again...

TheAssassin commented 4 years ago

Resetting worked fine, bot is up and running, but no map. I'll be running tcpdump now for more than 5 minutes, let's see if that will help.

pidator commented 4 years ago

Give it some time. Have you already started a new full clean?

TheAssassin commented 4 years ago

I just have. It's been running for ~1 hour, says uptime (not sure if I or the watchdog timer or resetting the WiFi caused the reboot, though).

Edit: I think it still has a map, otherwise it would be moving differently. It's going much faster and doesn't collide as often.

TheAssassin commented 4 years ago

@pidator it's been running for over 4 hours, having done two full runs. I doubt it's a timing problem, really. I've fetched the traffic dumps, and am going to look into them now.

pidator commented 4 years ago

@TheAssassin your issues at the moment are still the same:

??? (sorry for the review, but I just want to make sure I haven't missed anything during the tests in this topic)

Have you seen https://github.com/rand256/valetudo/issues/96 ? @Hacki1111 described a similar situation, the problem was the wlan SSID:

Yes, there must be a problem with special characters. The ssid contains a " \ " and a " / ". Try it! ;)

An other point to check: What's the result of the command? cat /mnt/data/valetudo/config.json | grep map_upload