riptidewave93 / LEDE-MR33

Bringup for the Cisco Meraki MR33 Access Point on LEDE
70 stars 7 forks source link

NIC Pains #2

Closed riptidewave93 closed 6 years ago

riptidewave93 commented 6 years ago

Currently this repo is "on hold" until the networking situation I mentioned at https://forum.lede-project.org/t/ipq806x-target-single-nic-devices/7292 is resolved.

As for getting this "fixed", there are a few ways forward:

  1. Release as is, and users will suffer/complain about the arp issues thinking something is wrong.
  2. Try to port over the dirty hacks meraki uses, but will still be stuck with #1 (Wonder if they do something in userspace to get around this? Feels like a VLAN tagging issue maybe?)
  3. Yolo attempt to get the new NIC driver going, and see what happens.

The big issue going with #3 is this new driver has not been merged into LEDE yet, even with that being the original intent at the time of development. This is troubling, as I prefer not to waste the little development time I currently have.

I also plan on shipping out a MR33 to @chunkeey to see if he can help figure out the best way forward.

riptidewave93 commented 6 years ago

Progress is being made in the staging branch by @chunkeey

kylegordon commented 6 years ago

Great to hear about it. Thank you to you both for all your hard work :-D

I think... we can watch from the sidelines here - https://github.com/lede-project/source/commits?author=chunkeey&since=2018-01-01T00:00:00Z

riptidewave93 commented 6 years ago

Changes to get the NIC mostly working are in master. Going to close this issue out. As for flashing/public release, documentation is being written ATM so keep your eyes open. No ETA.

riptidewave93 commented 6 years ago

Flashing info is live at https://drive.google.com/drive/folders/1jJa8LzYnY830v3nBZdOgAk0YQK6OdbSS

kylegordon commented 6 years ago

:tada:

That's my Saturday night all planned out then :-D

kylegordon commented 6 years ago

I think on Page 9 the --name=part.old portion of ubiupdatevol is superfluous. It seems to throw an error, and if I understand the UBI tools correctly, the volume name is set with the previous ubimkvol command.


ubiupdatevol: unrecognized option: name=part.old```
chunkeey commented 6 years ago

@kylegordon Thank you. The --name=part.old is a copy-pasta error. I removed it and uploaded a fixed PDF.

kylegordon commented 6 years ago

NP :-) All booted and running. Thank you again!

maciejtarmas commented 6 years ago

You guys are awesome. I set up the MR33 as my main access point and will report if anything goes wrong, but so far it looks like it's working correctly.

Initially after doing the sysupgrade, my MR33 came up, but after another reboot the phy couldn't set up a link even in failsafe mode. I haven't even touched the config yet at that point. I was only able to log in through serial, but couldn't find anything wrong.

Thankfully going over the whole procedure again brought it back to life.

radio2 is the interface that Meraki uses for scanning for rouge AP's, like on the MR18, am I right?

Looks like radio0 is the additional interface for scanning, radio1 is bgn and radio2 is the proper ac.

Oh, and if someone is using Windows for the procedure and is lazy enough to use IIS to serve the files to the MR33 for wgetting, make sure you add a MIME type for .itb files in the IIS config or you'll get a 404 error.

mgclabs commented 6 years ago

Awesome, thank you for freeing the Meraki equipment!

I'll flash this in a month or so just to make sure things are stable.

chunkeey commented 6 years ago

@maciejtarmas you are right about the ethernet in the failsafe. In my case, it's related to my old switch as it has problems with ramips and ar71xx too. A WA is: ifconfig eth0 down followed by ifconfig eth0 up should/would fix it.... or reboots (many)

@mgclabs and everybody else: The MR33 has been accepted upstream as well: https://github.com/openwrt/openwrt/commit/4943afd7818f56053231a5a7ae90e55da44f1f08

maciejtarmas commented 6 years ago

@chunkeey When the eth stopped working with my Dell laptop that I used for flashing, I hooked the MR33 to my HPE 1820 switch, but no luck. No reboot would make the eth set up a link.

Now after the second flash the problem is gone and the MR33 seems rock solid so far in basic access point configuration with radio1 and radio2 up, no matter how many times I reboot it. I have yet to try setting up some VLANs to test it further.

chunkeey commented 6 years ago

@maciejtarmas Thanks for the update. I did play around with it. The PHY has a dedicated reset gpio (47) however toggling it won't fix the problem in failsafe. This must either be a issue with essedma or the internal switch. Which is a bummer since the OpenWrt maintainer will probably not merge any patches related to it.. As he's working on a replacement for the Qualcomm's ethernet subsystem altogether.

Looks like radio0 is the additional interface for scanning, radio1 is bgn and radio2 is the proper ac.

The order in which radio0-2 (phy0-2) are assigned is more or less random. There's no fixed load or enumeration order. Which ever device gets through its initialization first gets to be radio0. Currently, the pcie device (QCA9887/8?) is winning the race. But there's no guarantee that this will always be the case.

kylegordon commented 6 years ago

@chunkeey with regards to the race in which devices come up and get assigned, am I right in thinking the bgnac radio is likely to be the one that Meraki used for their 'Air Marshal' feature - monitoring, detection, management, and so on.

If it is, is it likely to have a less optimized antenna? Should we be favouring the use of the other two radios for general purpose usage?

imaginator commented 6 years ago

Following up on the vlans on eth0 issue - this still seems to be causing an issue. Or am I just being dumb (using latest snapshot): https://github.com/imaginator/home-network/blob/master/mr33.settings#L308-L327

This results in endless dhcp requests going out with the right vlan tag. But the interface seems to ignore the tagged reply.

Has anyone else had this issue?

(other than that, what a great device. Thanks for your hard work @riptidewave93 )

chunkeey commented 6 years ago

If it helps, I can tell you what's going on. The MR33 does have an internal switch between the cpu and the QCA8035 Phy. And this is causing all the problems, you see the default configuration for the switch is check incoming and outgoing traffic for vlan-tags and route the traffic to the right port.... So any tagged frame can get redirected by the internal switch to the hardware equivalent of /dev/null.

imaginator commented 6 years ago

@chunkeey thanks very much for the explanation.

So if I understand you correctly I should avoid running vlans on eth0? Or use the swconfig util? Is there a known working config (or just anything without vlans)?

chunkeey commented 6 years ago

Well, if you want to make this work at this time, you'll have to manually write unknown magic values to some of the switch hardware registers (and maybe also patch the essedma driver). From what I heard, the internal switch is supposedly a QCA8337N, but it's not 100% the same. You can find proper datasheets for the QCA8337N on the internet, but sadly I haven't found any detailed datasheet for the IPQ40XX and its internal switch (I'm looking for register descriptions/reference, pinouts, programming manuals, etc....) (So, maybe if somebody can dig up a datasheet or two.... that would be schweeet and also help a lot :1st_place_medal: ) . [No, playing with swconfig or any config options will not help right now].

If this sounds too daring, you could also open and maintain a bug ticket over at bugs.openwrt.org and maybe the forum.lede-project.org too. Incidentally, chris has started a IPQ806x target & Single NIC devices thread there. But this didn't generate much of a reaction :disappointed: .

Flole998 commented 6 years ago

Would this Patch solve the issue? It seems to work on the 4019, maybe it also works on this hardware.

chunkeey commented 6 years ago

Would this Patch solve the issue? It seems to work on the 4019, maybe it also works on this hardware.

Sadly, I can't access the page. Chromium, Konqueror and Firefox are refusing to open the link:

Firefox:

An error occurred during a connection to gl.tossp.com:1443. Cannot communicate securely with peer: no common encryption algorithm(s). Error code: SSL_ERROR_NO_CYPHER_OVERLAP

Chromium:

This site can’t provide a secure connection gl.tossp.com uses an unsupported protocol. ERR_SSL_VERSION_OR_CIPHER_MISMATCH Unsupported protocol The client and server don't support a common SSL protocol version or cipher suite."

But (and this is a guess): If this is another rip-off of the "ipq40xx essedma: disable default vlan tagging" then the answer would be no.

At this point, it would require someone from Qualcomm with a @codeaurora.org email-address to sent that patch (or possibly a better one!) to the openwrt mailing-list (or github PR). I tried multiple times to get that patch merged but to no avail. I even went as far as integrating it tightly into the patch-set that went on to become the "ipq40xx" target in openwrt. However this was in vain as John made a point to remove it again with the comment "that he has a fancy new driver ready to go" ... However "that driver" is still missing-in-action [after more than a year now].

(Note: The MR33 (or IPQ40XX) are not really the only devices with issues, if someone wants to read some drama and comedy, then head on over to the "NEXX 3020 (MT7620) Wi-Fi issues" thread in the forums: https://forum.lede-project.org/t/nexx-3020-mt7620-wi-fi-issues/16008/2 )

(Note2: Please don't PM me any Qualcomm NDAd documents or "extracts" from them.)

Eliam077 commented 6 years ago

add ASUS RT-AC58U in IPQ40xx (a8cd67fa) · Commits · 砼砼 _ lede-k3 · GitLab.pdf

this is the patch reported by Flole998. can it be usefull ? (sorry i don't know)

Flole998 commented 6 years ago

The Link is working now again, unfortunately it seems like your assumption was correct and it is just that disable default vlan tagging thingy.

As the Patch is not helping with the issue, I honestly don't see the point of pushing it to the openwrt master. Or does it fix some other issues? Would the driver made by John solve this issue? If yes, it is probably a good idea to ask John about what the hell is going on here and we should figure out why it's a good idea to get something working that is broken right now instead of waiting for some magic to just happen.

Just to confirm that I am understanding it correctly: The Switch has to be reconfigured to simply forward the packets to the CPUs port. So what we need is a driver that configured the switch to do that or one that allows usage of the swconfig tool.

Why would it require someone from Qualcomm (and what does it have to do with @codeaurora.org) to send that patch? I think all that is necessary is a working image at some fork, that should make the maintainers want to merge it ASAP. Unless they want LEDE 2.0 and then merge again after some time after everyone noticed that it is a waste of resources to have 2 different Teams working on the exact same thing.

BTW: You explicitely asked for "Register descriptions/reference, pinouts, programming manuals, etc....", so don't be surprised that people start to send them to you. If you could give a short explanation on why that is not helpful it would probably be appreciated by everyone who sent you something (I didn't, but I am curious on why it's not helpful if you know the magic values you need to write). Or do you already know the magic values that need to be written but they are covered under a NDA and can therefore not be used?

ptpt52 commented 6 years ago

try this? https://github.com/ptpt52/openwrt-openwrt/blob/9584879093bb1379d5ad433105315026a6bda174/target/linux/ipq40xx/patches-4.14/713-0002-essedma-refine-txq-to-be-adaptive-of-cpus-and-netdev.patch

ptpt52 commented 6 years ago

The Link is working now again, unfortunately it seems like your assumption was correct and it is just that disable default vlan tagging thingy.

it is not the problem related to 'disable default vlan tagging...'

ptpt52 commented 6 years ago

and The dts need to be modify: https://github.com/ptpt52/openwrt-openwrt/commit/2159bb7a72a900b34ac1711c5d82f8eea9d6a25d

Flole998 commented 6 years ago

Has anybody tried yet what @ptpt52 suggested? Does that fix the Issue?

Eliam077 commented 6 years ago

no i didn't try because i don't know how to do it :)

Eliam077 commented 6 years ago

there is some news on the commit tree of openwrt. there is work of John Crispin on ipq40xx today.

davidfrendo commented 6 years ago

Hi guys, I am having the same VLAN issues with my Meraki MR33, I am running the latest 18.06.1 firmware. Has there perhaps been a fix in the meantime which I may be missing?

Many thanks in advance.

Eliam077 commented 6 years ago

sorry davidfrendo, still the same point.

Flole998 commented 5 years ago

I highly doubt that this will ever work, nobody seems to care about it while beeing capable of fixing this.

maciejtarmas commented 5 years ago

If there’s anyone interested, I’d be willing to donate some money to get the MR33 VLAN issue fixed upstream.

Eliam077 commented 5 years ago

me too, i'd willing to donate something to get vlan on my mr33.....

Flole998 commented 5 years ago

Well now john has released some sources, maybe this helps? @chunkeey @riptidewave93 Would integrating this help? I haven't looked through it yet, also I haven't checked if it's working or what is not working (as there seems to be some kind of issue somewhere, but without a description of what it is, searching for it is pointless)

maciejtarmas commented 5 years ago

Flole998 is talking about these 2 posts, I presume:

https://forum.openwrt.org/t/ipq40xx-target-single-nic-devices/7292/16

https://forum.openwrt.org/t/ipq40xx-target-single-nic-devices/7292/17

Flole998 commented 5 years ago

Yes exactly, I should have linked them. Thanks!

KP-IPI commented 3 years ago

Guessing this was never fixed? :'(

Eliam077 commented 3 years ago

Unfortunately, no sorry


From: Kishan @.> Sent: Sunday, May 2, 2021 2:20:42 AM To: riptidewave93/LEDE-MR33 @.> Cc: Eliam077 @.>; Comment @.> Subject: Re: [riptidewave93/LEDE-MR33] NIC Pains (#2)

Guessing this was never fixed? :'(

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/riptidewave93/LEDE-MR33/issues/2#issuecomment-830714130, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJW6B6MOJ2J6366HGE4X5GDTLSLFVANCNFSM4EHYW2JQ.