linuxmint / warpinator

Share files across the LAN
GNU General Public License v3.0
1.2k stars 81 forks source link

the app cannot find the computers on the network anymore in v1.0.0 #22

Closed jprando closed 4 years ago

jprando commented 4 years ago

when i used version 0.0.1 i was able to find the computers on the network and perform file transfers successfully.

I decided to update to the new version 1.0.0, in addition to the new icon (very cool) do not find the computers on the network.

this case happened to someone else?

I am aware that the warpinator is not a finished product. I'm testing it and I would like to report that it stopped working ...

is there any information i can provide to help solve the problem?

mtwebster commented 4 years ago

Thanks, I've seen this a couple of times but I have pinned it down yet (and I agree it's recent) - I need to go and figure out where I broke it.

jprando commented 4 years ago

im try here and ...

even this works https://github.com/linuxmint/warpinator/commit/9826e43cee0a8965bbf975da8f1d8418d8a67776

from that it doesn't work anymore https://github.com/linuxmint/warpinator/commit/e228bcc10fc46e4c5316ed0fbdf38dfc99cd6ec8

slowscript commented 4 years ago

from that it doesn't work anymore e228bcc

That's my PR. I'm however unable to reproduce the issue. I've tried multiple computers and distros (Mint, Debian, Manjaro...) and all find each other. Could you please provide some more information? Are you sure you are running the same version on both computers? Does it say something in the console? What do you see when you run avahi-browse -at with Warpinator running on both computers? Thanks.

mtwebster commented 4 years ago

I'm not sure it's that commit, at least for me.

I have 3 computers I'm testing this with.

One machine has been refusing to connect to other machines reliably (though they never fail to find it). I've got a fresh install on another parition, same behavior.

I just moved the 2 'problem-free' machines to use one access point on my network, the problem machine on the other (same network, subnet ,etc..) and now everything works fine between all computers.

I think if I go back to the commit prior to adding the encryption/certificate exchange, this stops happening. But I have trouble pinning the blame solely on that (perhaps I'm trying to shove too much data into the 'parameters' of the ServiceInfo) since this doesn't happen on every machine.

The problem machine is actually getting a discovery notification, but when it tries to retrieve the ServiceInfo, it times out (even if I set the timeout to 20s). The info is there though:

+ wlp4s0 IPv4 p51-20                                        _warpinator._tcp     local
+ wlp4s0 IPv4 mike-lm19-samsung                             _warpinator._tcp     local
+ wlp4s0 IPv4 mike-at3                                      _warpinator._tcp     local
+     lo IPv4 p51-20                                        _warpinator._tcp     local

If I run with -r I can see all the service info, including the parameter data for all 3 machines.

Any thoughts?

slowscript commented 4 years ago

Yes, it could be that parameters are too big. I've read somewhere that the Txt record shouldn't be over 400 bytes (which it is) and that it can sometimes be truncated. This can be easily verified by running avahi-browse with -r (as you've said). The only problem I've had was when I tried to run it with zeroconf 0.25 from pip (which is not backwards compatible) but this is likely unrelated because it doesn't explain why the issue can be fixed by using a different access point. Again, I cannot reproduce this issue (version 1.0.0, 2 different computers, 1 VM) so I'm not sure how to be helpful here. I just noticed that my commit was mentioned and wanted to figure out what was wrong with it. I'll wait for OP to respond. I think his issue is something else if he identified that commit as the source of the problem.

mtwebster commented 4 years ago

That's what I was worried about - I had trouble finding a definitive max size.

For kicks, I tried using a 1024 bit key size, and the problem went away. I'm considering, at least for the time being, switching to that, and generating a new key every time warp starts.

I'm dragging my feet on not abusing the parameters this way, because I'd have to transfer this some other way using udp.

jprando commented 4 years ago

prints...

machine 01 - IP 192.168.0.170 (DHCP dynamic)

version 0.0.1 working here ( https://github.com/linuxmint/warpinator/commit/9826e43cee0a8965bbf975da8f1d8418d8a67776 ) machine01 prando warpinator v001 001

version 1.0.0 dont working here ( https://github.com/linuxmint/warpinator/commit/e228bcc10fc46e4c5316ed0fbdf38dfc99cd6ec8 ) machine01 prando warpinator v100 002

machine 02 - IP 192.168.0.4 (DHCP Fixed)

version 1.0.0 ( https://github.com/linuxmint/warpinator/commit/e228bcc10fc46e4c5316ed0fbdf38dfc99cd6ec8 ) machine02 prando warpinator v100 001

version 1.0.0 dont working here ( https://github.com/linuxmint/warpinator/commit/e228bcc10fc46e4c5316ed0fbdf38dfc99cd6ec8 ) machine02 prando warpinator v100 002

if there is anything i can do to provide more information, just let me know!

machine01 prando warpinator info

mtwebster commented 4 years ago

Can neither computer see the other? Does it sometimes work?

Can you give something a try:

edit /usr/libexec/warpinator/auth.py as root, find this line:

https://github.com/linuxmint/warpinator/blob/master/src/auth.py#L121

and change that 2048 to 1024, save it. Then, go into ~/.config/warpinator/remotes and remove all the files there. Do this to all computers, then start warp on them.

If you're building yourself, just edit the source auth.py if you want and re-build - either way doesn't matter. Just make sure to remove those files.

Thanks

jprando commented 4 years ago

Can neither computer see the other? Does it sometimes work?

Can you give something a try:

edit /usr/libexec/warpinator/auth.py as root, find this line:

https://github.com/linuxmint/warpinator/blob/master/src/auth.py#L121

and change that 2048 to 1024, save it. Then, go into ~/.config/warpinator/remotes and remove all the files there. Do this to all computers, then start warp on them.

If you're building yourself, just edit the source auth.py if you want and re-build - either way doesn't matter. Just make sure to remove those files.

Thanks

it didn't work here

I noticed that the two machines have the same name I changed the name of one, I repeated the process

it didn't work here

mtwebster commented 4 years ago

Do you run warpinator from a terminal? No errors there?

jprando commented 4 years ago

yep, yep

jprando commented 4 years ago

I will try to connect the two computers by cell phone (access point mobile, hotspot)

jprando commented 4 years ago

by mobile (hotspot)

last commit didn't work master

this worked 9826e43

test with ./src/auth.py #L121 with key_size=1024

slowscript commented 4 years ago

Thanks for the screenshots. I think I see 3 problems. Correct me if I got something wrong. 1) Machine 1 sees machine 2, but in the second picture (not working, version 1.0.0) it appears that machine 2 is still running the old version which is incompatible with the new one (old format of service name - starting with "warpinator" instead of hostname). screen

2) Machine 2 doesn't see machine 1 at all. No idea why. But in both of its screenshots it seems that it's running the new version. I'm confused now. 3) Both machines appear to be named the same. This made me aware of a regression in my commit. Removing the IP address from the service name removes a bit of uniqueness, so when Warpinator sees a service with the same name, it thinks that's the local machine and ignores it. You've said you changed the name later, so that's not the cause of your issue. I will still try to fix this though.

The Txt records look normal and I don't see any truncation. All I can advise is to double-check the version you are running on both machines. Maybe also restart Warpinator once more to make sure the DNS-SD record gets updated. All entries in avahi-browse -at should be in the same format after switching the version.

Edit: added image

jprando commented 4 years ago

ok i will try here.

about appears that machine 2 is still running the old version

I will try to extract more information for you...

jprando commented 4 years ago

about item 3:

🤔 suggestion: use hash warpinator + hash( ip + hostname + anything + credit card number + ccv )

result: warpinator_35b96a650c2fbb93d23dd81116b561b8

no matter what size or amount of information you use to generate the hash, it will always be a fixed size 😉

the origin will not be known and published among the machines on the network, but it can be validated on the original machine

+1 security challenge idea added to the warpinator service

ps: credit card number + ccv, it was a joke! use the group code and port number, for example, and a uuid value with salt fixed in the code.

jprando commented 4 years ago

my network topology

mynetwork

jprando commented 4 years ago

I have news ...

now wapinator v1.0.0 is working

warpinator v100 working

I put the same name again, and it stopped working! changing, again, the names of the computers to be different returned to work

🎉✨🥳✨🤩✨✨✨

jprando commented 4 years ago

I returned the change from file /src/auth.py from value 1024 to value 2048 in key_size and it still works

slowscript commented 4 years ago

So it's just problem 3. I will look at it and make a PR when I have time to do so (hopefully today).

slowscript commented 4 years ago

Hi, sorry for the delay. Now that I've got time to look at this, it seems that it won't be as simple as I thought. I tried implementing Android's NSD's approach: appending "(2)", "(3)" etc. to the hostname ("_2", "_3" in my case). That broke the authentication for some reason. It finds the other machine but then this happens (even if I delete the config folder): Handshake failed with fatal error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED. You can check it out here. One issue could be that the certificates are stored with the hostname as file name, so maybe it gets incorrectly overwritten? If that's the case, I don't know how it's possible it worked before. I don't understand this cryptography stuff, so I've given up on this path. The easiest solution would be to just tell users that they shouldn't use the same hostname on more than one machine since that causes issues with other programs anyway. Another could be to return to having the IP in the name. I don't know how that worked but maybe it could work again... What do you think?