konsumer / tplink-lightbulb

Control TP-Link smart lightbulbs from nodejs
MIT License
190 stars 32 forks source link

Unable to scan or connect to wifi #60

Closed jamedeus closed 3 years ago

jamedeus commented 3 years ago

OS: Kubuntu 20.04 Node: v14.17.2 Bulb: KL130

There appears to be a bug effecting wifi onboarding - I received the same error when attempting to scan available networks and connect to a network (both while connected to the bulb's setup network). Other functions I tried (turn light on/off, change color temp) worked perfectly. To rule out issues with my node setup, I fresh installed node on a different computer (same OS + node version) and got identical errors.

Scan:

jamedeus@desktop:~/tp-link-python-client$ tplight wifi 192.168.0.1
TypeError: Cannot read property 'ap_list' of undefined
    at /usr/lib/node_modules/tplink-lightbulb/build/lib.js:1:1042
    at processTicksAndRejections (internal/process/task_queues.js:95:5)

Connect:

jamedeus@desktop:~/tp-link-python-client$ tplight join 192.168.0.1 <credentials-removed>
TypeError: Cannot read property 'ap_list' of undefined
    at /usr/lib/node_modules/tplink-lightbulb/build/lib.js:1:1042
    at processTicksAndRejections (internal/process/task_queues.js:95:5)

Unfortunately I'm not able to debug this as I don't know javascript.

I can however confirm that the bulb is capable of connecting without the kasa app. I had previously used this python client to onboard and control my TP-Link HS220 smart dimmers. It includes an option for sending raw JSON to the device from CLI. However, it didn't work with my new KL130 bulb which led me to this project.

After reading this commit, I noticed that the first JSON parameter is different than what I had used for the dimmers (Bulb=smartlife.iot.common.softaponboarding, dimmer=netif), while the rest was identical. I went back to the python client and swapped these parameters out. The bulb connected without issue, confirming that TP-Link hasn't disabled this with a firmware update.

While I'm not able to debug, I'd be happy to reproduce the problem and test out potential solutions if that would help. Great project, much more user-friendly than what I had used previously!

konsumer commented 3 years ago

This is a new feature, so bare with us.

It appears this is only for scanning for what wifi the bub sees, so you might be able to jump directly to join <ip> <SSID> [SECRET] command, if you already know the exact SSID (likely since it's your wifi netowrk.) Use quotes around your SSID or SECRET if there are spaces, or any funny characters.

I think it's this part I am guessing that r.netif.get_scaninfo is not defined. I don't have the same setup to test right now, but I am willing to troubleshoot, in the hopes we can get it fixed for everyone.

Would you mind installing node and running it that way, if you haven't already?

This should work on kubuntu

curl -fsSL https://deb.nodesource.com/setup_16.x | sudo -E bash -
sudo apt-get install -y nodejs git

After that, get a fresh copy and run it directly from the js file:

git clone https://github.com/konsumer/tplink-lightbulb.git
cd tplink-lightbulb
npm i

# now this is essentially the same as the regular CLI too
node ./src/cli.js wifi <IP_OF_BULB>

This will probably give you the same error, but you can edit src/cli.js to give some debugging. In this part just remove this line:

.then(r => r.netif.get_scaninfo.ap_list)

then run node ./src/cli.js wifi <IP_OF_BULB> again, and send me the output, it hopefully will be helpful for figuring out what is different on your setup. This is telling it to send the whole chunk of output, regardless of the shape from the bulb.

jamedeus commented 3 years ago

Good catch on the quotes, hadn't thought to try that. I factory reset the bulb and tried with both quoted (using the copy I got from npm) but still got the same error.

After cloning the repo I got the same error with 1 additional line:

jamedeus@desktop:~/tplink-lightbulb$ node ./src/cli.js wifi 192.168.0.1
TypeError: Cannot read property 'ap_list' of undefined
    at /home/jamedeus/tplink-lightbulb/src/lib.js:82:39
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async Object.handler (/home/jamedeus/tplink-lightbulb/src/cli.js:238:20)

After commenting out line 82 in src/cli.js I get this:

jamedeus@desktop:~/tplink-lightbulb$ node ./src/cli.js wifi 192.168.0.1
{
  "netif": {
    "err_code": -2001,
    "err_msg": "module not support"
  }
}

I also tried join with the git copy, both with that line commented and without. Errors are identical to wifi <ip> in both cases.

jamedeus commented 3 years ago

I think it's this part I am guessing that r.netif.get_scaninfo is not defined.

It looks like this is sending {"netif":{"get_scaninfo":{"refresh":1}}} which I can confirm is unsupported by this bulb - I get the same error when I try in the python client:

jamedeus@desktop:~/tp-link-python-client$ ./tplink_smartplug.py -t 192.168.0.1 -j '{"netif":{"get_scaninfo":{"refresh":1}}}'
Sent:      {"netif":{"get_scaninfo":{"refresh":1}}}
Received:  {"netif":{"err_code":-2001,"err_msg":"module not support"}}

After changing this line to 'smartlife.iot.common.softaponboarding': { wifi scan works correctly:

jamedeus@desktop:~/tplink-lightbulb$ node ./src/cli.js wifi 192.168.0.1
{
  "smartlife.iot.common.softaponboarding": {
    "get_scaninfo": {
      "ap_list": [
        {
< Removed due to length>

Seems like this part could be fixed by trying smartlife.iot.common.softaponboarding if netif fails since there isn't a way to determine the target bulb (and it's required syntax).

However, I'm still unable to join after making this change - but I get a new error:

jamedeus@desktop:~/tplink-lightbulb$ node ./src/cli.js join 192.168.0.1 <creds removed>
TypeError: wifi.find is not a function
    at Object.handler (/home/jamedeus/tplink-lightbulb/src/cli.js:252:27)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)

I then reverted my changes to lib.js and tried to join again. I still get the new TypeError: wifi.find is not a function error. Seems to indicate that the original error is resolved now that ap_list is defined.

I was able to connect after editing src/cli.js line 252 and arbitrarily declaring const chosen = argv.SSID, disabling the check to see if it exists in the wifi list:

jamedeus@desktop:~/tplink-lightbulb$ node ./src/cli.js join 192.168.0.1 <creds removed>
{
  _: [ 'join' ],
  '$0': 'src/cli.js',
  ip: '192.168.0.1',
<creds removed>
}
OK, joined jamnet.

So something seems to be going wrong when it parses for argv.SSID in the wifi list.

konsumer commented 3 years ago

Hmm, I probably need some checking for model to figure out the correct shape.

jamedeus commented 3 years ago

That would be much cleaner than my idea haha, I didn't realize I could get the model with tplight details <ip>.

Any idea what's going on with TypeError: wifi.find is not a function? Looks like it's just parsing an array for the SSID, I don't understand js enough to know why this would fail. This prevents it from connecting even after the other issue is fixed.

konsumer commented 3 years ago

find should be Array.find so my guess is wifi is not an array (undefined maybe.) This does work work for others, so it could just be the earlier issue cascading.

konsumer commented 3 years ago

I just added some tooling and I was super-annoyed with the old ES style (I like ES6 with builds for older node.)

If you want to run from git, now you have to npm run build before running src/tplight.js because it's transpiles it for pkg (for making standalone CLI.)

So, with my new tools setup and improvements all around, I wrote a script that works on my network to scan and I think I have it working for both style devices, in the CLI tool. Can you try the CLI tool to scan/setup wifi, and if it doesn't work, try modifying the bash script to get it working? Lines that start with # are comments, so just uncomment the part you need, and play around till it works, and we should be good to go. I put double-comments (##) on notes. Let me know what you discover (working message format for your bulb) and I will add it (as a fall-through, so both work.)

I am hoping the changes I made will just work for you, now, but we might need to play with it a bit to get it working.

Here is my script I used to test raw messages & try out each part. if the regular wifi and join (published as v1.7.1 on npm, and in releases) commands don't work, you might be able to use it to play around with things and figure it out.

test.sh.zip

konsumer commented 3 years ago

I do wish I had more info about how to know what kind of message a device wants (from firmware id or something) It would be cool to not have to fallthrough. You can see the library does this quite a bit, but it'd be neat to just know what kind of message to send before I send it.

konsumer commented 3 years ago

The python script uses all of what I am calling "new style", but I actually don't know which came first, the messages that are longer, like smartlife.iot.common.softaponboarding might actually be newer. I noticed what appears to be newer devices use the long format, so I probly have those switched in my thinking.

Maybe we could all collect some version-numbers, and what style the device supports to save 1 failed operation (which retrys with other format) for some devices.

hw_ver on info call appears on both (although those that support long message-type give better info) and long messages seem to be "2.0". Shorter format appear to be "1.0". This is only on the devices I personally own, though.

It might be a simple:

const msg = (this.info.hw_ver === '2.0) ? newMessage : oldMessage

or similar

jamedeus commented 3 years ago

Wow looks like a pretty significant rewrite, great work!

After cloning the repo everything seems to be working fine! Here's everything I tried, devices were factory reset before each test:

I didn't even need to mess with the bash script, everything worked fine with node src/tplight.cjs <cmd> <ip>.

One very minor note: when scanning for networks the output formatting is different depending on if the device uses "old style" or "new style". Doesn't really matter since the user will likely only see one or the other, but probably an easy fix.

Device using netif:

jamedeus@desktop:~/tplink-lightbulb$ node src/tplight.cjs wifi 192.168.1.233
[
  { ssid: 'Solomon', key_type: 3 },
  { ssid: 'jamnet', key_type: 3 },
<removed due to length>
]

Device using smartlife.iot.common.softaponboarding:

jamedeus@desktop:~/tplink-lightbulb$ node src/tplight.cjs wifi 192.168.1.225
[
  {
    ssid: 'jamnet',
    key_type: 3,
    cipher_type: 2,
    bssid: '<removed>',
    channel: 5,
    rssi: -48
  },
  {
    ssid: 'Solomon',
    key_type: 3,
    cipher_type: 2,
    bssid: '<removed>',
    channel: 6,
    rssi: -65
  },
<removed due to length>
]
konsumer commented 3 years ago

Awesome!

One very minor note: when scanning for networks the output formatting is different depending on if the device uses "old style" or "new style"

Yep. I actually work around it by guessing some of the missing fields in the less-verbose output. I could map it so it looks the same, but I kinda like it give more info if the bulb supports it.

jamedeus commented 3 years ago

Ah neat, yeah not a real issue just wanted to point it out in case it was unintentional.

jamedeus commented 3 years ago

hw_ver on info call appears on both (although those that support long message-type give better info) and long messages seem to be "2.0". Shorter format appear to be "1.0". This is only on the devices I personally own, though.

Unfortunately hw_ver doesn't seem to be a reliable indicator - all 3 of my devices (KL130 + 2 HS220s) return hw_ver: '2.0', but the bulb uses smartlife.iot.common.softaponboarding while both dimmers use netif.

I did notice that sw_ver matches for my dimmers, but is different for the bulb. Dimmers both have sw_ver: '1.0.3 Build 200326 Rel.082355' while bulb has sw_ver: '1.0.6 Build 200630 Rel.102631'. I've read about TP-Link pushing firmware updates that modified the API in the past so maybe that could be it? I've had the dimmers longer, and I firewall all these by MAC before installing them, so they have whatever firmware they came with.

konsumer commented 3 years ago

Ah, that sucks. Well, the fall-through method seems to work ok for wifi setup and others, so I think I will close this issue, but make a new issue for working out better message-format selection, in the future.

jamedeus commented 3 years ago

Here's the full details for all my devices in case it's helpful. Couple possibilities:

Thanks again!

konsumer commented 3 years ago

model and hwId both look like reliable indicators of which device it is.

Agreed. I'd prefer to not keep a big table of all the supported versions (it's more fragile, and requires a lot more maintenance.) I think we might be able to try some other stuff to figure it out, though. I am ok with the fall-through for now, but I bet it could be improved.

mic_type might be useful? Says IOT.SMARTPLUGSWITCH for a dimmer which is odd

I saw this on mine, too. super-weird.

Please continue tracking this issue at #61, as it's really a separate thing.