lwfinger / rtl8723bu

Driver for RTL8723BU
284 stars 144 forks source link

rtl8723bu blocking interrupts for long time? #9

Closed tkrill closed 8 years ago

tkrill commented 8 years ago

We are currently using the rtl8723bu in a small embedded project and the device driver seems to work nicely, thanks!

It however seems like the driver is blocking interrupts for a somewhat lengthy time thus starving the rest of the system trying to execute. We experience serial port overruns etc even when the system otherwise have no noticeable load. If we pick down the wlan interface everything works.

My question here is if it could be possible to identify the path that blocks the interrupts and hopefully try to make it a bit more finegrained?

lwfinger commented 8 years ago

Sorry that it has taken me so long to answer. Are you using the driver in station, or AP mode?

tkrill commented 8 years ago

Hi lwfinger, no problem :) We are running in station mode only.

lwfinger commented 8 years ago

In that case, your problem must be one of a spinlock delaying execution. It took a bit of analysis to verify, but the only place that interrupts are disabled for a station are when the computer is resuming operations following a suspend.

I have enabled locking statistics in hope that I can detect where the kernel is spending time waiting.

tkrill commented 8 years ago

lwfinger, a big thank you for looking into this!

One question, when you say "computer is resuming operations following a suspend" are you talking about the computer returning from pm stand by or something directly regarding the driver?

lwfinger commented 8 years ago

Returning from standby or hibernation.

tkrill commented 8 years ago

Ok, this is an embedded system and we don't hibernate during normal operation. Even though we have PM enabled in the kernel.

I took some time and skimmed the source code and examined the code segments that uses enter/exit_critical and without following every path i could not find any obvious culprit. There where of course a lot of long segments holding the spinlock but nothing that looked suspicious.

The thing that makes this strange is that the system do not appear to be under any load when the problem appears :|

lwfinger commented 8 years ago

We do have a known locking dependency that gives a lockdep warning on kernels that are testing. Unfortunately, the Realtek code does not contain a clear description of the data being protected by the various mutexes and spinlocks.

It is possible that you are seeing a "perfect storm" of events that I do not see with my faster, multi-processor system.

lwfinger commented 8 years ago

I got a start on untangling the locking problems for this driver. Please pull the latest and see if it helps. Similar changes for rtl8723bs appear to have helped a lot.

tkrill commented 8 years ago

lwfinger, thank you for taking time on this! Will test this asap.

We have done some timing tests in the mean time and that actually also indicates that it might not be the driver alone that accounts for our problems. I toggled a GPIO pin upon the enter/exit critical that in turn used spin_lock_irqsave.

This however yielded no extreme locking periods. Thus your guess on a perfect storm or another issue in parallell with this could be likely as well.

lwfinger commented 8 years ago

The problems that I fixed were in the enter/exit critical_bh routine. Interrupts are not disabled there, but it could spin for a long time, particularly on systems with slow CPUs, particularly if there is only one. The same kind of fix helps a BayTrail tablet with an RTL8723BS wireless chip.

tkrill commented 8 years ago

Ok, i briefly tried the latest git-version and unfortunately it does not solve our problems. I will keep investigating this.

lwfinger commented 8 years ago

I realized that my current approach cannot work. I have another idea, but I have no idea how long it will take to have an implementation. I'll let you know when there is new code for testing.

tkrill commented 8 years ago

The cause for this seems not to be in the driver so lets close this for now.

lwfinger commented 8 years ago

Let me know if you later find that this driver is at fault.