ik1xpv / ExtIO_sddc

ExtIO_sddc.dll - BreadBoard RF103 / HDSDR
Other
72 stars 26 forks source link

Use SIMD to convert adc samples to float #148

Closed howard0su closed 3 years ago

howard0su commented 3 years ago

Since AVX2 is supported in all CPUs with x64 in additional to USB3.0 enabled PC are reasonable newer, and it has the feature I am looking for. (coverting). I prefer we keep AVX2 as the minimum support hardware. It should be fine from my view. we can add fallback later if user says opposite.

howard0su commented 3 years ago

not ready for merge. I need refector out the DSP from RadioHandler.

hayguen commented 3 years ago
howard0su commented 3 years ago
* please don't touch fftPerBuf .. as i see you don't need it in your new/changed code .. and i'm going to localize it per decimation

sure. I am using fftPerBuf in order to grantee there is no packet left out after SIMD. If fftPerBuf % 8 == 0, we should be fine. So I static assert it to make sure we can catch the bug. Maybe we can change to assert if you want to localize it.

* minor: 'duplicate form from frame halfFft samples' : what?

Oh, typo.

* is there (already been) a race condition, when multiple threads got woken up?: copying already float converted data from 'endloop'. should that copy and convert 16 bit data again?

Yes, there is race condition. If the last thread didn't finish the converting, we will get garbage data. I can fix it later.

* conversion itself with MIPP looks a bit complicated. but would need a deeper look at that

I referenced its example code. I am not expert here. I checked the assembly code which make senses to me.

howard0su commented 3 years ago

I closed the race condition. may have some perf hit as we need to convert those samples twice. halfft # samples per frame.

hayguen commented 3 years ago

may have some perf hit as we need to convert those samples twice. halfft # samples per frame.

conversion itself should be free - whilst touching/copying the data. there may even be a slight improvement, cause 16bit input is smaller

howard0su commented 3 years ago

@Hayati Ayguen h_ayguen@web.de Can you help review the RAND code? It seems working but it creates some noise. I have no idea what I am doing wrong.

On Sun, Jan 10, 2021 at 9:26 PM hayati ayguen notifications@github.com wrote:

may have some perf hit as we need to convert those samples twice. halfft # samples per frame.

conversion itself should be free - whilst touching/copying the data. there may even be a slight improvement, cause 16bit input is smaller

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ik1xpv/ExtIO_sddc/pull/148#issuecomment-757476075, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF3GRH3S2S4U6OOBHCOZJ3SZGTHLANCNFSM4V4HSBDA .

-- -Howard

hayguen commented 3 years ago

will like review it, but will take some time .. can you fix the conflict first?

hayguen commented 3 years ago

you already catched/fixed the bug?

hayguen commented 3 years ago

think, we should also directly normalize the converted samples to +/-1. thus, multiply with 1.0f/32768.0f - also on the fly. decimation and it's normalization would/should be independent of input data's int16_t format

howard0su commented 3 years ago

you already catched/fixed the bug?

yes. It is because of unaligned memory access. When copying from previous sample memory, it is unaligned memory address.

howard0su commented 3 years ago

think, we should also directly normalize the converted samples to +/-1.

thus, multiply with 1.0f/32768.0f - also on the fly.

decimation and it's normalization would/should be independent of input data's int16_t format

I tried that before. Current the GainScale considered this scale as well. However if we apply the scale here, any additional benefit? Since today's approach can save one multiply op.

howard0su commented 3 years ago

when enabling RAND, the SIMD version is slower than normal version. I need to dig into details. Worse case, I will revert that part back.

I would prefer checkin this first to have unittest enabled. Then I plan to introduce ringbuffer to completely close race condition on input and output.

hayguen commented 3 years ago

Multiply: you might be right

Rand with simd: I would check, if the compiler did remove the if's, checking the template parameters ..!? If constexpr might help.. else, a deeper look is necessary .. Werner just reported some measurement results .. indicating, the rand should be activated by default ... But also either!? But check later one after fixing flaw in ddc

howard0su commented 3 years ago

yes, I saw the measurement on RAND and Dither. Dither is easy as it is just a setting in GPIO. RAND, need some help on SIMD converting. worse case, leave it as it is today.

On Mon, Jan 11, 2021 at 8:38 PM hayati ayguen notifications@github.com wrote:

Multiply: you might be right

Rand with simd: I would check, if the compiler did remove the if's, checking the template parameters ..!? If constexpr might help.. else, a deeper look is necessary .. Werner just reported some measurement results .. indicating, the rand should be activated by default ... But also either!? But check later one after fixing flaw in ddc

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ik1xpv/ExtIO_sddc/pull/148#issuecomment-757925380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF3GRHZTX2A4YCX4DG6RC3SZLWLBANCNFSM4V4HSBDA .

-- -Howard

howard0su commented 3 years ago

Close this PR and replaced with #152

The test infra will submit via a separate PR.