ik1xpv / ExtIO_sddc

ExtIO_sddc.dll - BreadBoard RF103 / HDSDR
Other
72 stars 26 forks source link

Use Ringbuffer for Input&output Buffer #157

Closed howard0su closed 3 years ago

howard0su commented 3 years ago

The perf is not good after this change. Maybe we need change output to a ringbuffer as well. Not sure. please review.

howard0su commented 3 years ago

i am thinking we may want to use callback (IOCompletionPort) in USB stack.

howard0su commented 3 years ago

unfortunately, it is slower than before on my laptop, which is a I7-8665. I need more testings. and also waiting the fix of dynamic extio_len.

Oscar Steila notifications@github.com于2021年1月19日 周二下午5:30写道:

@ik1xpv approved this pull request.

Look fine to me. I tested the code on I7-3770 . It runs faster by some units%

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ik1xpv/ExtIO_sddc/pull/157#pullrequestreview-571054100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF3GRFHYMQGXFM3OIQWXTDS2VGNBANCNFSM4WCOO5UQ .

howard0su commented 3 years ago

Please review again. Now it doesn't use dynamic ext_blocklen. so it should work with the current HDSDR. I need help to validate the performance. On my laptop, i cannot see much difference as 32M cannot work well anyway.

ik1xpv commented 3 years ago

I made some test on I7-3770 ( my Laptop has still temperature problem :-) I tested 64M adc clock , USB 12k audio LO to frequency not exact bin multiplier to activate shift Both are compiled in release IF sample rate 32M 16M 8M 4M 2M 1M 0.5M this branch#4f54221 CPU% 27 17 14 13 12 12 12 ver 1.1.0 CPU% 16 12 10 9 8 - - The old version still seems faster :-(

ik1xpv commented 3 years ago

I made a test on my laptop with SR 8 M. The v1.1.0 is faster 23-24 % vs 29-30% of this branch

howard0su commented 3 years ago

no intention to commit as perf regression.

ik1xpv commented 3 years ago

I made a comparison of the action compiled windows vs the master. I'm using the old I7-3770 :-) justarun

howard0su commented 3 years ago

Thank you for the testing. The result is actually expected. The goal for this PR is having the input and output decoupled so that I can add the functions to support multi channels in a cleaner way.

Since we have more threads here and in order to coordinate between threads, I added some busy loop in the code. I defined the following: The number may need to adjust to reduce CPU usage. As far as we don't see the overall perf slow down, it is fine to use a bit more CPU. const int spin_count= 1000000;

On Mon, Apr 26, 2021 at 12:24 AM Oscar Steila @.***> wrote:

I made a comparison of the action compiled windows vs the master. I'm using the old I7-3770 :-) [image: justarun] https://user-images.githubusercontent.com/9883800/116001110-3c2ab580-a5f3-11eb-8719-479d76dcb6c9.jpg

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ik1xpv/ExtIO_sddc/pull/157#issuecomment-826350716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF3GRHXXIQVZIBDUY6MYK3TKQ64DANCNFSM4WCOO5UQ .

-- -Howard

howard0su commented 3 years ago

Hi Oscar,

I added one more change to reduce spin_count to 100 which seems to help CPU usage a bit. It will be great if you can test the 64M bandwidth to see if there is any regression. The focus here is getting the current implementation to a pipeline solution so that I can add more processing into the pipeline without complicating the code too much.

I plan to add more channels as the next step so that one channel will be processed by one thread for iFFT. and also supporting the sample rate down to 48Khz with some software decimate after the current fft approach. The decimate will be processed in another thread as well. This requires the whole process in the pipeline fashion.

On Mon, Apr 26, 2021 at 8:43 AM Howard Su @.***> wrote:

Thank you for the testing. The result is actually expected. The goal for this PR is having the input and output decoupled so that I can add the functions to support multi channels in a cleaner way.

Since we have more threads here and in order to coordinate between threads, I added some busy loop in the code. I defined the following: The number may need to adjust to reduce CPU usage. As far as we don't see the overall perf slow down, it is fine to use a bit more CPU. const int spin_count= 1000000;

On Mon, Apr 26, 2021 at 12:24 AM Oscar Steila @.***> wrote:

I made a comparison of the action compiled windows vs the master. I'm using the old I7-3770 :-) [image: justarun] https://user-images.githubusercontent.com/9883800/116001110-3c2ab580-a5f3-11eb-8719-479d76dcb6c9.jpg

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ik1xpv/ExtIO_sddc/pull/157#issuecomment-826350716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF3GRHXXIQVZIBDUY6MYK3TKQ64DANCNFSM4WCOO5UQ .

-- -Howard

-- -Howard

ik1xpv commented 3 years ago

Yes. CMake 198 is faster than version CMake 196. It seems still a little slower than master CMake 194 justarun2

I will continue testing tomorrow. Thanks for the new code architecture !

ik1xpv commented 3 years ago

I made a comparison using Open Hardware Monitor to trace the CPU load. I disabled wifi. justarun3 I repeated the test some times and the CMake198 looks a little better that master CMake 194. The time windows used is 8 minutes.

howard0su commented 3 years ago

Please also focus on the performance in additional to CPU usage. This version still cannot play 64M well on my laptop. If no objection, i will first commit this version and start break current r2iq into 3 stages (or maybe 3). Stage 1: Convert samples into freq domain samples Stage 2: Shift freq domain samples into the right LO and apply filter, decimate, and do iFFT Stage 3: do finetune

ik1xpv commented 3 years ago

Howard, I made some play with CMake198 and seem to me equal or better than CMake194 Here a tone reception of my 20MHz reference. 20MHz_DIG It looks fine with no phase discontinuity. I made a comparison test with my laptop (ADC clk 64M, IF 32M).
LaptopRun5 The two releases have very similar performance. Be free to merge and test the new architecture. Thanks :-)

howard0su commented 3 years ago

thank you for all your testing.

My next PR will be even bigger in terms of changes.