slowscript / warpinator-android

An unofficial implementation of Warpinator for Android
GNU General Public License v3.0
481 stars 29 forks source link

Connection not completing between Android and Arch linux (Manjaro) #23

Open uncledulgaria opened 3 years ago

uncledulgaria commented 3 years ago

I have been running Warpinator between installs of Linux Mint and Manjaro for a year. Today I discovered warpinator for android, and installed from Google Play. W for A installs on my Moto G (5S) plus, but it did not ask for any permissions, and the list in Settings>Apps & Notification > Warpinator > Permissions, the section is empty. It doesn't find any of the other devices on my subnet.

I am running Warpinator in Manjaro (with Cinnamon desktop) v 1.1.2 installed from the AUR. This identifies and connects to another PC running the same distro OK. It does see the Moto, but reports that the Moto couldn't "complete the connection".

Just once I have seen a message in W on Android stating that a server failed to start, but can't re-create it. Searches across all my storage for "warpinator" and "slowscript" failed to find anything - where might I find the log files ?

The phone is not rooted, and I won't do that to this phone.

Spec Device extract for Moto follow :- Manufacturer: motorola Model: Moto G (5S) Plus Android: 8.1.0; 63857; SDK27 Display: 1920 x 1080; DPI: 408; Evaluated Size: 5.4" Touch Screen: Multitouch 5 points GPU: Adreno (TM) 506; Qualcomm; OpenGL ES-CM 1.1; OpenGL ES 3.2 V@269.0 (GIT@908a5ce, I77d3059488) (Date:06/07/18) RAM: 0 MB - actially 3GB Processor: 8 cores; Qualcomm Technologies, Inc MSM8953; Max: 2016.0 MHz; Min: 652.8 MHz

slowscript commented 3 years ago

Don't worry about permissions. Warpinator doesn't need any. Not being able to discover devices on the network is a problem though. Sometimes it helps to restart the app on both the phone and computer. The mDNS protocol is a bit glitchy, especially Android's implementation. I'll try using some third party implementation in the future to see if it gets any better.

We don't store the log in a file (although we probably should for cases like this). If you want to obtain it (which would be helpful), get Android platform tools (should be in AUR), enable USB debugging and run adb shell ps | grep warpinator on the computer to get the PID of a running instance of the app (should be second column) and then run adb logcat --pid=1234 where 1234 is the PID.

StickyDigit commented 3 years ago

Until now I'd been mostly running warpinator 'on demand' for testing, but I'm starting to use it like most humans might, so I'm seeing oddness with discovery too.

If I start an android client, then start all the other stuff, it tends to behave quite well, however...

If I quit the android client and restart it, it tends to immediately discover one of the seven hosts I have running (mix of desktop and 'droid). Often it will be the same host as it had found at last run. Left a while it sometimes finds a second within few minutes, but not always, even with caffeine.

So, if after closing a fully working session, droid1 fires up and sees droid2, it may not see any of the other hosts at all. Closing it, and restarting it will most likely show droid 2 immediately, with full connection, again with no other hosts discovered. An FC will usually cause a different single host to be discovered, sometimes with one or two others being discovered eventually, but not all of them.

If I restart other hosts there's a high chance they'll appear immediately in the android client.

The desktop app seems fine in this setup, both wired and wireless, even on bridged VMs

So, to get the android client to behave well, it seems I have to visit all the other hosts and restart warpinator. This is fine with one physical desktop and one phone, but not so cool with more than one android (chicken and egg paradox).

It's as if the android client is nearly blind to all but startup signals from other hosts, unless FC'd... mostly :-|

StickyDigit commented 3 years ago

FWIW I can ping, and 'nc' connect to port 42000 between hosts that are refusing to see each other. The desktop client on WiFi sees all functioning hosts within 11 seconds of starting it, every time.

I'll try and get my head around the protocol so I can be more help is narrowing this down. I agree with your thoughts on the linuxmint warpinator issues.. the protocol should be better documented outside the source.

uncledulgaria commented 3 years ago

Sometimes it helps to restart the app on both the phone and computer

I tried all sorts of combinations of re-starts of the laptop and the phone and the apps themselves before posting. It was one one occasion (possibly when I had restarted the phone with the app autostart disabled)) that I saw the "server not started" message, but it didn't stay long enough to screenshot it.

The AUR only has an ARM64 version of ADB so I can't install on AMD64 based PC. However, I do have access to a RasPi400 and RaspberryPi OS is Debian, so I will try and install on that and see what I can do with the debug when I get the chance to set it up.

I have just picked up the phone to check and see how Warpinator is currently set (autostart or not) and it has happily found my desktop PC, and the PC can see the phone. After one successful transfer (PC to phone), both are reporting connection lost, but the connection then re-established OK without any restarts and I have sent files both ways since. I can't quite resolve my recent experience with StickyDigit 's comments, the phone was on, but W was not running until I started it (settings = not in background and not autostart) and the PC has been restarted earlier today, with autostart enabled. i.e. the PC was already running W, the phone wasn't.

StickyDigit commented 3 years ago

In my case there were multiple clients. It works pretty well with one desktop and one Android. -- Sent from a phone. Please excuse my brevity.

StickyDigit commented 3 years ago

I've been banging my head against this, and decided to get more toys out.

I've removed a space from my group code, and it seems that a lot has improved, however...

Testing with F-droid current release on an Android 7, two 10's , and main self-built on two 11's. All WiFi. Also Mint 20.1 - two wired, one on Wifi, and two wired bridged VMs.

So.. ten hosts. All hosts should see nine others.

All the Mints are fine, even with a space in the group name. They all quickly (in seconds) find any happy hosts.

The Android 7 seems to be quite reliable, even when restarted it'll find all the 'happy hosts', albeit a little slower than the mints. Sometimes it'll need a host looked into and refreshed, but that's almost normal person friendly, as most non-tech folk are conversant with hitting a refresh button.

Android 10's and 11's seem to need an FC after "menu-quit" to find much more than one host on restart. Without a space in the group code they find more than with (after said FC), but by no means all of them, and with no discernable logic to what's missing. At the moment for example, one of the 11's has found a 7-droid, a 10-droid a wifi-mint and a wired mint, the other has found two mint-vms, a wifi-mint, two 11-droids and a 10-droid. They've been running for about half an hour.

Please let me know how best to use this heap of hardware which might help de-bug your otherwise grand software. IMHO this glitch is all that's stopping it from being "granny friendly".

StickyDigit commented 3 years ago

Sorry to spam the thread. I just turned IGMP snooping off on my router and now the 10's and 11's are finding more hosts per attempt on average, although still a random selection missing after several minutes. IGMP snooping was not affecting Warpinator on android 7, or my mints.

Hope that is a clue. Would like to turn that back on eventually :-)

Without an FC, the 10s & 11's still only tend to find one host after restarting, and of course some hosts that subsequently restart.

One of the 11's seems to need a reboot sometimes to find anything at all after first run. It seems more likely to need reboot if it's dropped to 4G and back during use. Aeroplane mode and back doesn't help that.

Perhaps connected:- sometimes the display name from one of the hosts also appears incorrectly as the display name of another. I have just seen the display name of one of the 11's appearing as the display name of both. They are set differently. This has happened a few times, and I think it was the 11's each time. Will make notes.

StickyDigit commented 3 years ago

From quick digging with tcpdump etc. I get the impression that multiple androids, coming and going, and all trying to start by calling themselves 'android', being variously accepted by linux hosts, and then dropping out and back, then getting allocated android-3 instead of say android-5 by mDNS, causes them to be ignored by 'stable' hosts that already think they know what a particular host is by mDNS name, and finding it clashes with the IP/MAC down the line.

All the desktop hosts on my net have hostnames, which they seem to use at the outset when starting to talk with mDNS, and the odds of a clash are few.

Five Johns walk into a subnet, The first one calls himself John, the second, realising he can't also be called John, asks to be called John1... John leaves, and so does John1, who immediately walks back in, introduces himself as John and gets accepted as such, except by those who saw him earlier, who all want to call him John1, so they ignore him... or call him (now) erroneously John1. Imagine the five of them drifting in and out for a cigarette!

The above could hint at why some of my androids got seen as the wrong screen name on the android client, and perhaps why desktop clients that had seen an android earlier, would not be seen by the android without luck, or starting the desktop client again so it'd forget that John5 called itself John2 earlier, and actually reply to him in civil conversation :-D

I don't know enough about mDNS, or the android implementation at this point to be much more help. There is clearly something not quite tessellating.

StickyDigit commented 3 years ago

FWIW, I tried this:-

app/src/main/java/slowscript/warpinator/GrpcService.java
-                .setDisplayName(Server.displayName).setUserName("android").build());
+                .setDisplayName(Server.displayName).setUserName("unusualname").build());.

I shut off warpinator on all hosts. On 'unusualname', android 11 phone, I installed the patched version, FC'd Warpinator, cache flushed, and rebooted, then restarted Warpinator desktop on the five mints.

On unusualname.. the address in warpinator was displayed now as unusualname@myphone1.

Still the only way I could get unusualname to see all the desktops was again to run round restarting Warpinator on the ones that failed to see it (or vice-versa). The desktop clients all stayed sane throughout, except with regard to the Android client.

Quitting Warpinator on unusualname left it only seeing one host without doing an FC as before, and then only seeing a few hosts still.

Not sure if it's relevant, but I note that the _warpinator mDNS for android and desktop differs slightly.

The txt portions of Linux clients shows as...

txt = ["type=real" "hostname=linuxmint1"]

..whereas the Android client has no "type=real". I don't know the relevance of this, if any.

If I was barking up the right tree, it was almost certainly in the wrong way.

I'll stay under my rock until I've read up more on mDNS, or you ask me to try anything more rational/helpful.

slowscript commented 3 years ago

Space in the group code should not be an issue. It is used only when the devices are authenticating which happens right after discovery. If it went wrong there, you would see the device but with the "connection failed" icon.

I agree with your thoughts on the linuxmint warpinator issues..

This is not an issue with the warpinator protocol (which specifies how the devices transfer files and authenticate). This is an issue with the zeroconf / mDNS protocol (which defines how devices discover each other).

The change in code you made doesn't affect much as this is code is part of the warpinator protocol and has nothing to do with discovery. What you changed is the username (as in the short name of a Linux user). There is no such thing on Android so I just hardcoded this. Those "android-3.local" etc are hostnames, they are selected automatically and I have no control over them.

I'm grateful you are trying to investigate this. The second-to-last post is the most interesting. It gave me an idea to try discovering devices first, wait a second and only then register the service, assuming it will know who is on the network by then and not collide with other androids. I also implemented doing a "flush" registration just like original Warpinator does. This is what is responsible for the "type=flush" and "type=real" thing. I pushed these changes a few hours ago so you can give it a try when you have some time. In my case, I can only test it on a tablet with Android 5, two Android 10 phones and a desktop. Android 5 always worked fine but one of the 10s sometimes had trouble finding other Androids. Now it seems to be a bit better. Launching them one by one worked reliably. Launching them all at once not so much. One of the androids sometimes has trouble finding one or all other androids. I was able to fix that with a restart though.

If this doesn't help, I have found a different implementation (jmdns) and I'll see if it is any better.

StickyDigit commented 3 years ago

Great. Thanks for being so kind about my ramblings.

So... testing on an Android 11 phone, starting with warpinator on two wired desktops and a VM all running already.

At start on droid, only the VM was found.

I fired up warpinator on my wireless laptop, which saw all the other desktops, and had instant two way with the droid. The droid saw this wifi laptop two way as soon as it started.

Started Warpinator from previous build on another android 11. This one saw the VM the first had seen, and the wifi laptop which had come online a few minutes before. It did not see the already running test-droid. The test-droid saw it as one-way.

Started Warpinator on another mint VM. It was seen at all the other mints quite swiftly. It saw and had two-way with the android 11 (previous build), while only half connecting with the new build, which didn't see it at all.

Quit and restarted at test droid. It now only sees a wired host it had not seen before. The other droid now sees test-droid as one way.

FC on test-droid. At start it now sees (two-way) the second wired, the second VM, the wifi laptop, and the other droid. It's not seeing one of the wired hosts. Restarted warpinator on the missing wired host and it's seen two-way by both droids... Restarting it on the other made no odds at the droids.

Starting another android 11 (now there are three), which gets seen two-way by both the other droids and all the mints.

...so it looks like running systems still get missed, but more restarted hosts get seen than before. The second android (using a previous version of warpinator) at this point is seeing five hosts.. it did not see the second VM start up, The test-droid is seeing six hosts (2 11's, 2 wired-mint, 1 vm-mint, and 1 wifi-mint).. notably, the test-droid is no longer seeing the VM which it saw at first start. I have no doubt it would see it if I restarted it.

The third droid sees droids, a wired mint and a wifi mint.

Quitted and FC'd on all but the test-droid. Which I simply quit and restart. It sees 1 wired only.

Quit, FC and restart. I see the same wired, plus 1 wifi and two VM.

Quit and restart I get one wired again. Repeat three times to same result.. same host visible.

This time when I quit, force stop and restart I see 1 wired and 1 wifi.

Once more.. Now I get a VM, a wired, and a wifi.

Again for luck... Now I get a wired and a wifi.

Always leaving it a minute or more to settle, though the desktop variants don't seem to ever take that long to find everyone.

So.. I'm not sure other than luck is making much odds to the need to FC to see more hosts, and it seems still that for the most part hosts need restarting to be seen by the android. It does seem to see slightly more on average than the previous version.

I don't suppose there's much point me trying on android 7.

Let me know if I can try anything else.

slowscript commented 3 years ago

So, just switched to JmDNS (in the jmdns branch), did some brief testing and it seems to be pretty reliable so far. I don't have that many devices to test with so the chance of running into potential bugs is not that high. It's a pretty significant change so I would be grateful if you can test this before I roll it out and accidentally break everybody's devices :)

StickyDigit commented 3 years ago

On it. -- Sent from a phone. Please excuse my brevity.

StickyDigit commented 3 years ago

Tested current main on 7 and 11. Still bonkers. Then... realising you'd said the changes were in the jmdns branch, I proceeded to test that. It didn't kill my kittens or make my beer go flat. It DID see ALL the running hosts on my net within a very few seconds. I think you've cracked it! :1st_place_medal:

StickyDigit commented 3 years ago

You may need to add the Apache2 license to your license stuff as that's what jmdns is under. I don't know if it clashes with GNU GPL. I am not an IP lawyer.

A credit/link for jmdns in the 'about' would be nice. It might steer other devs away from wasting time on the default mdns implementation. It could also provide a logical place to append a link to the Apache2 license if it is required.

StickyDigit commented 3 years ago

Hang on... Looks like I found some oddness when I quit and restarted. When I closed and re-opened the wifi desktop, it only saw the androids! When I closed and re-opened the androids, they saw a lot less (like before). Will verify a few combinations.

slowscript commented 3 years ago

Awesome! Thanks for all the help.I think there should be no issue with the license as Apache is more permissive than GPL. I will add a list of libraries and their licenses on the about page. I wanted to do that in v1.0 but forgot about it.

StickyDigit commented 3 years ago

Mints.. already running. 1 VM, 2 Wired, 1 Wifi

WiFi laptop in front of me. Sees three other Mints. Good.

Start Warpinator-JmDNS on android 7 and 11.

Everything appears to see everything else. Good.

Sending files from WiFi mint to Wired Mint works fine.

Start another VM. Now everyone sees 6 other hosts.

Closing WiFi Mint Warpinator, leaving the two droids alive. Restarting it it finds everything. Repeat. Same. Must have been a freak occurrence in my last post.

Closing and re-opening Warpinator-JmDNS on 7 and 11, using menu-quit. After a few seconds, it all settles again. 6 hosts everywhere.

Not sure what the blip was. Maybe it had something to do with starting one running 'main' during the first round of tests.

MOST EXCELLENT!!!!

StickyDigit commented 3 years ago

Having just replaced my hand-rolled version with the izzy package again, I've spent a bit of time trying to break this in various ways, and so far all I've found that's repeatable is this.

(Using Four mints, 2 wired, 1 wifi, 1 VM wired-bridged. and five androids, 3 11's and 2 7's.)

Everyone sees everyone, and all is good.

If I turn off wifi on one of the droids, give it long enough to show all hosts 'off', then switch it back on, it takes ages before it sees the desktops, and it does see all of those again and they see it too. Unfortunately it never seems to see the other droids again unless I hit the refresh at either end. Quit and restart gets it too of course. All the other androids continue to see the "wandering" android as permanently absent until that specific host has been refreshed (either end gets it).

It's as if when a droid goes offline for even a moment, all the other droids refuse to speak to it unless instructed explicitly.

Can you cause it to rescan/broadcast on change back to wireless network?

This is starting to look ready for deployment to wife/granny. :-D

StickyDigit commented 3 years ago

Perhaps unrelated...

...I quit and restarted the 'wandering' warpinator a couple of times, and at one point it refused to see any other client at startup, although they could see it, albeit half-duplex. I'd not needed to give it more than thirty seconds to see them all so far this test run, and it usually sees the desktops within about five seconds.

I FC'd, cleared cache, swore, stamped my feet, and danced round the fire with a rubber chicken.

After all that, left it on for a few minutes it all came back to life.

Is there a big timer value somewhere that can be adjusted to prevent/reduce that?

I'll try and find a way to make it repeatable if you don't already know where that might be.

Bibi56 commented 3 years ago

MOST EXCELLENT!!!!

Any chance to see this version soon in the F-Droid store or should I consider compiling on my own?

slowscript commented 3 years ago

@Bibi56 It is already on F-Droid. If you are still having issues, it is likely something with your network or firewall.

@StickyDigit Restart on network change is already implemented since June (version 1.3), I just forgot it was because of this issue. I guess it can be closed now? I skimmed through the thread and haven't found any other things to fix. Unfortunately I cannot reduce the 2 minute timer that rebroadcasts the mDNS query as it is internal to the library we use.

Bibi56 commented 3 years ago

@slowscript on F-Droid, it's the master branch, not the jmdns branch, right?

slowscript commented 3 years ago

@Bibi56 jmdns branch has been merged into master a long time ago (0352a439915e3956039f8b6ac39620f8e8d1e1ce)