kaloz / mwlwifi

mac80211 driver for the Marvell 88W8864 802.11ac chip
394 stars 119 forks source link

Perfomance issues with WRT3200ACM #118

Closed stuwilkins closed 7 years ago

stuwilkins commented 7 years ago

I have a WRT3200ACM for which I am having performance issues with the wireless. I know that is not helpful as a bug report, but can try any debug requested. This is linked to an openwrt thread here:

OpenWRT Forum

It appears that there is an issue that the performance degrades with time and eventually a reboot is needed to get perforce back, this appears to be in the 1 to 2 hour timeframe.

I have tried OpenWRT and LEDE images, currently running LEDE r2155.

yuhhaurlin commented 7 years ago

Yes. This problem is reproduced.

stuwilkins commented 7 years ago

Thanks @yuhhaurlin. Does your comment mean that the current master doesn't fix the problem? If not any idea of when a fix is likely?

yuhhaurlin commented 7 years ago

Fix is under working.

Chadster766 commented 7 years ago

Thanks @yuhhaurlin I appreciate all the good work you do on this driver :+1:

stuwilkins commented 7 years ago

Yes thanks for all your hard work @yuhhaurlin, if I can help let me know for testing etc.

osxest commented 7 years ago

I'm the another owner of a WRT3200ACM-EU (european model). And I'm also experiencing the same issue as @stuwilkins and I will gladly help you all with tests. @yuhhaurlin, thank you so much for your hard work and please keep us informed on the progress!

wongsyrone commented 7 years ago

Can this related to unused value?

fwcmd.c: In function 'mwl_fwcmd_get_fw_region_code_sc4':
fwcmd.c:2882:6: warning: unused variable 'status' [-Wunused-variable]
  int status;
      ^~~~~~
yuhhaurlin commented 7 years ago

Yes, it will be removed.

Chadster766 commented 7 years ago

How can I enable more driver debugging output to help troubleshoot the issue?

Chadster766 commented 7 years ago

I ran the same test but with yesterdays release of Linux 4.8.15.

[  5] 7198.00-7199.00 sec  8.19 MBytes  68.7 Mbits/sec
[  5] 7199.00-7200.00 sec  9.69 MBytes  81.3 Mbits/sec
[  5] 7200.00-7200.39 sec  3.84 MBytes  82.7 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-7200.39 sec  68.9 GBytes  82.2 Mbits/sec                  sender
[  5]   0.00-7200.39 sec  68.9 GBytes  82.2 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

It seems the marvel crypto driver commit resolved the issue.

commit bdb5ed2040f54bc4b2cdaf2f313588daee694b82
Author: Romain Perier <romain.perier@free-electrons.com>
Date:   Mon Dec 5 09:56:39 2016 +0100

    crypto: marvell - Don't corrupt state of an STD req for re-stepped ahash

    commit 9e5f7a149e00d211177f6de8be427ebc72a1c363 upstream.

    mv_cesa_hash_std_step() copies the creq->state into the SRAM at each
    step, but this is only required on the first one. By doing that, we
    overwrite the engine state, and get erroneous results when the crypto
    request is split in several chunks to fit in the internal SRAM.

    This commit changes the function to copy the state only on the first
    step.

    Fixes: commit 2786cee8e50b ("crypto: marvell - Move SRAM I/O op...")
    Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bfef274e4dae76cdee275b5985c85758e346e825
Author: tim <tim.c.chen@linux.intel.com>
Date:   Mon Dec 5 11:46:31 2016 -0800

    crypto: mcryptd - Check mcryptd algorithm compatibility

    commit 48a992727d82cb7db076fa15d372178743b1f4cd upstream.

    Algorithms not compatible with mcryptd could be spawned by mcryptd
    with a direct crypto_alloc_tfm invocation using a "mcryptd(alg)" name
    construct.  This causes mcryptd to crash the kernel if an arbitrary
    "alg" is incompatible and not intended to be used with mcryptd.  It is
    an issue if AF_ALG tries to spawn mcryptd(alg) to expose it externally.
    But such algorithms must be used internally and not be exposed.

    We added a check to enforce that only internal algorithms are allowed
    with mcryptd at the time mcryptd is spawning an algorithm.

    Link: http://marc.info/?l=linux-crypto-vger&m=148063683310477&w=2
    Reported-by: Mikulas Patocka <mpatocka@redhat.com>
    Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 86bea59218a05031deeac415544eb48c1556456b
Author: Horia Geantă <horia.geanta@nxp.com>
Date:   Mon Dec 5 11:06:58 2016 +0200

    crypto: caam - fix pointer size for AArch64 boot loader, AArch32 kernel

    commit 39eaf759466f4e3fbeaa39075512f4f345dffdc8 upstream.

    Start with a clean slate before dealing with bit 16 (pointer size)
    of Master Configuration Register.
    This fixes the case of AArch64 boot loader + AArch32 kernel, when
    the boot loader might set MCFGR[PS] and kernel would fail to clear it.

    Reported-by: Alison Wang <alison.wang@nxp.com>
    Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
    Reviewed-By: Alison Wang <Alison.wang@nxp.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 666531ca650e67f111af70789a48b0d169772b78
Author: Romain Perier <romain.perier@free-electrons.com>
Date:   Mon Dec 5 09:56:38 2016 +0100

    crypto: marvell - Don't copy hash operation twice into the SRAM

    commit 68c7f8c1c4e9b06e6b153fa3e9e0cda2ef5aaed8 upstream.

    No need to copy the template of an hash operation twice into the SRAM
    from the step function.

    Fixes: commit 85030c5168f1 ("crypto: marvell - Add support for chai...")
    Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
yuhhaurlin commented 7 years ago

It takes time to fix this problem.

Chadster766 commented 7 years ago

I don't think this is related to the driver because the wireless never disconnected but instead to another component in the system.

Is it happening during WAN IP addresses renewal?

wongsuo commented 7 years ago

My 2.4 G signal is not stable. 5G is incredibly unstable.

yuhhaurlin commented 7 years ago

Yes, WRT3200ACM has problem.

luinix commented 7 years ago

Any status update on this issue?

Thanks!

yuhhaurlin commented 7 years ago

Sorry, enhancement is still ongoing. Thanks.

Chadster766 commented 7 years ago

@yuhhaurlin I suspect a new firmware might be available can you check with your Marvell contact?

yuhhaurlin commented 7 years ago

No. I work for Marvell. Thanks.

Chadster766 commented 7 years ago

Ok thanks, I didn't realize you worked for Marvell :smile: that's awesome!

luinix commented 7 years ago

Every minute this issue is unresolved, someone fails to see a cat picture on the Internet.

Please, think of the kittens.

fwohlfarth commented 7 years ago

I have the same Issue with the wrt3200acm. Wireless works 2 hours than no connection to the internet and to the local area network. After restart the router it works again for 2 hours than the same problem. I have installed the last openwrt lede.

kubrickfr commented 7 years ago

How long before WRT3200ACM is fixed? We're paying customers, this is not just any opensource project made by people on their spare time, we are entitled to better communication, clarity and consideration. I'm tired of reading "yes we're aware that the WRT3200ACM has problems, hang on tight".

yuhhaurlin commented 7 years ago

Sorry. We work for it.

duh-nm commented 7 years ago

For what its worth to other people, to keep me from having to intervene when it happens, a cron job running this script will reboot your router when the issue begins. i run it every two minutes myself. Modify the log location depending on where yours is held of course. My 5ghz does it once in awhile, but my 2.4 ghz does it all of the time, they both seem to throw the message i'm checking the log for, at least on my router. yours may differ depending on build being used.

!/bin/ash

tail -n 10 /var/log/system.log | grep -q "MEMAddrAccess timed out" if [[ $? -eq 0 ]] ; then reboot fi

luinix commented 7 years ago

@yuhhaurlin I understand you are working on it, but could you at least give us an ETA? Have you guys already found the issue and you are coding the solution? Or are you still researching, trying to catch the bug? Is there anything we can do to help you?

Thanks!

yuhhaurlin commented 7 years ago

It needs to modify mwlwifi to use the same firmware of 88W8964 as the one used by stock firmware. It needs time to complete it.

luinix commented 7 years ago

I extracted what seems to be the right firmware from the Linksys official firmware, and I created a pull request with it:

https://github.com/kaloz/mwlwifi/pull/138

I cannot test it now, can someone volunteer for giving it a try?

@yuhhaurlin

yuhhaurlin commented 7 years ago

Current mwlwifi can't work with this firmware.

luinix commented 7 years ago

Ok, I see.

nitroshift commented 7 years ago

@yuhhaurlin

I would suggest a new API for the 88W8964 since it's harder to get both firmwares working under same API.

nitroshift

gufus commented 7 years ago

From: nitroshift [mailto:notifications@github.com] Sent: Monday, January 23, 2017 11:44 PM To: kaloz/mwlwifi Cc: Subscribed Subject: Re: [kaloz/mwlwifi] Perfomance issues with WRT3200ACM (#118)

@yuhhaurlin I would suggest a new API for the 88W8964 since it's harder to get both firmwares working under same API. Nitroshift


I agree

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

lux209 commented 7 years ago

Hi yuhhaurlin, first thank you for working on this driver problem ! Do you have any schedule for the fix ? No need to have a precise date but it would be nice to know if it will be more like in 2 weeks or 2 month. Also is there anything we can do to help ?

Thanks

NotoriousPyro commented 7 years ago

It'll be fixed when it's fixed. There can be no timescales on fixes because not always does the developer know what is the fix nor what is causing the issue.

fwohlfarth commented 7 years ago

Hope there is soon a solution.....

kubrickfr commented 7 years ago

@NotoriousPyro A totally valid point if you're not talking to your customers. In the meantime, many of us have a router sitting on a shelf and we're totally screwed with it because we can't use it or get a refund.

duh-nm commented 7 years ago

You bought it with Linksys software, does that have issues? Or did you buy it from openwrt/lede themselves? No one made you use lede/openwrt and this driver and firmware. If you need to use the router without this issue, I suggest going back to the software you actually paid for. You paid Linksys, not these guys, is my assumption, I haven't seen them selling this router.

kubrickfr commented 7 years ago

Yes, I bought it with Linksys software and the claim (their claim) that it was compatible with OpenWRT. Am I missing something here?

And no, sorry, the Linksys stock firmware doesn't fulfil my requirements and I only bought it because it was OpenWRT compatible, it's not that I don't want to use it, but I just can't (the main reason if you want to know being that the stock firmware doesn't support VLANs).

duh-nm commented 7 years ago

Their claim also includes the statement that they don't support you using openwrt/lede and you're on your own. Unless you are dontaing to these developers, you didn't pay them a cent. Linksys has all your money, they made the claim. My point being, the money excuse is not going to help your case when that's between you and Linksys.

NotoriousPyro commented 7 years ago

Just because it's bugged doesn't mean the claim that it works on openwrt is wrong.

It just means it doesn't work correctly yet. It WORKS, but not perfectly.

yuhhaurlin commented 7 years ago

Rearchitecture and working for the same firmware used by stock firmware are ongoing. Sorry for the delay.

NotoriousPyro commented 7 years ago

You're disappointed at open source? Did you pay for your support for open source? If not, then nothing is owed to you.

Open source is meant to allow you to fix the issues yourself or change it how you see fit. It doesn't entitle you to free support or outrage because it doesn't work.

All of those solutions, Freenas, Xen, etc. They all require you to do some work to get them working. If you can't figure it out, it's not the fault of open source. I've always had good experiences of open source even if it doesn't always work.

thagabe commented 7 years ago

@NotoriousPyro You do realize mwlwifi is a marvell paid and managed repo right? I'm not saying that delays and issues don't happen but like any enterprise or capitalistic venture you are accountable to your consumers. If we, the consumers, paid a premium for the device to be " open-source ready" the natural expectation is that it will indeed work with open-source software. Your point is still valid when explaining that the router is- sort of- opensource compatible rather than ready, but this still does not excuse marvell but especially linksys from getting a "get out of jail" card by pushing out a product without adequate work on the components that it is touting as "ready", right now marvell MUST understand that to release such a product most, if not all, talks about the changes have to start prior to the release of the product and work on the opensource code must be merge and staged before the release of the product. Even developers for openwrt/lede have expressed their dissatisfaction with this business practice of releasing a product and working on the components after said release. Intel has always been the prime example of how a company should handle opensource in a timely and open manner (well until they began to push for more and more closed source blobs). source: I'm a LEDE contributor (not developer) and before rango I had to wait 1 year for wifi with the OG WRT1900ACv1 (something that was said to not happen again)

cilix-lab commented 7 years ago

I understand a lot of us bought WRT3200ACM's and hoped it would magically work with open source, as it could be understood by Linksys' statement about "open source ready". If one gets mad because of these issues, one should get mad at linksys' false statement instead of the people working to solve them.

Anyway, I believe all this chat about disappointments, claims, etc, should be left for elsewhere, since I think the point of commenting on an issue should be giving feedback, relevant information and/or helping solve the problem.

That said, I appreciate the work being put into this issue and, just as a workaround, I'm keeping my WRT3200ACM router doing all the routing and only my guest wifi on (almost no traffic) and my home wifi running on my old AC router as bridge. That way, I get all the cool stuff I love about OpenWRT/LEDE, the awesome processing power of the WRT3200ACM and it keeps working perfectly without the need of constant reboots. Yeah... I'm missing on lots of the great features I bought this router for, but at least I get to use some of them till fixes become available.

Again, thanks to everyone working on this!

thagabe commented 7 years ago

@cilix-lab I agree, no point is filling this issue with opinions. I shall do that too use my new WRT3200ACM for routing and my WRT1900ACv1 as the AP only, bye bye ASUS AP you will be missed.

davidc502 commented 7 years ago

Issue of degrading performance observed with the acm3200 running LEDE r3063-f2e6e11
root@lede:~# uname -a Linux lede 4.4.42 #0 SMP Sat Jan 21 22:05:32 2017 armv7l GNU/Linux

In the system and kernel logs, over and over.

Wed Feb 1 00:35:50 2017 kern.err kernel: [ 365.770004] ieee80211 phy0: cmd 0x801d=MEMAddrAccess timed out Wed Feb 1 00:35:50 2017 kern.err kernel: [ 365.775864] ieee80211 phy0: return code: 0x001d Wed Feb 1 00:35:50 2017 kern.err kernel: [ 365.780421] ieee80211 phy0: timeout: 0x001d Wed Feb 1 00:35:50 2017 kern.err kernel: [ 365.784619] ieee80211 phy0: failed execution

However, during this time, in LuCi, many of the stats were showing blank I.E. Memory, Sessions, DHCP leases << All blank in LuCi. So, I wonder if something else besides the driver is having an issue.

EDIT-- I should add, it wasn't until wifi clients couldn't connect where I started seeing those logs.

NotoriousPyro commented 7 years ago

Changing the WiFi settings inside LuCI and applying them repeatedly (within 10-20 secs of the last save) rapidly produces this issue and can cause high CPU load of 2.00-4.00.

It appears only fixable temporarily until reboot.

tapper82 commented 7 years ago

Hi pleas fix this thanks. To thos saying that we should use the Linksys stock firmware pleas be aware that any one that has to use a screen reader because of site issues can not use the Linksys stock firmware, as the layout and the coding of the interface makes it impossible! Thats the mane reason why i started using Openwrt and Gargoyle.

ghost commented 7 years ago

Can you give us any vague ETA at all (days/weeks/months/years)?

I don't even require insane WLAN speeds. Just fix the crashing first and tune for performance later.

This thing has been on the market for almost 6 months now and I can only use it as an expensive paperweight.

I'm quite fed up with waiting and almost ready to sell this thing again at the end of the month.

NotoriousPyro commented 7 years ago

Using this router without the WiFi and with a separate WiFi point works fine for now.

lydasia commented 7 years ago

/centuries