marcel303 / framework

A creative coding library.
https://centuryofthecat.nl/
52 stars 1 forks source link

Duplicated copies of file SSE2NEON.h #2

Closed jserv closed 2 years ago

jserv commented 4 years ago

There are two copies of file SSE2NEON.h as the following paths:

They are with the identical content.

I am curious about the way how SSE2NEON is used for this project. Meanwhile, SSE2NEON is being actively developed via https://github.com/DLTcollab/sse2neon If SSE2NEON is used in this project, please consider to migrate to newer SSE2NEON, which brought more SSE intrinsics, performance enhancements, and fixes.

marcel303 commented 4 years ago

Hi Jim,

Thank you for reaching out.

This repository of mine contains many prototypes, creative experiments/apps and library code. I think I just copied the file from one place to another without thinking too much about consolidating it into one place. I will consider adding a submodule reference to its origin.

I was missing _MM_TRANSPOSE when I was porting some of my SSE code to ARM using SSE2NEON. I see that's been implemented now. Nice! I actually went ahead and researched how to do a 4x4 floating point transpose using NEON interleaved/transpose load intrinsics, but realized it wasn't so obvious. In the end I found an implementation, which I added here, https://github.com/marcel303/framework/blob/5c589ed6bd7c7bc8dc6f3c7ecde495bfee660db3/1stparty/binaural/neon-transpose.h#L40 Source: http://tessy.org/wiki/index.php?NEON%A4%C732bit%A4%CE%C5%BE%C3%D6 This version is very similar, if not identical to yours!

As for use cases.. I mostly use it for the binauralization library I wrote. See: https://github.com/marcel303/framework/tree/master/1stparty/binaural

It performs FFTs on four audio buffers in parallel (using SSE/NEON), and performs convolution with the left/right binaural HRTFs in parallel. I wanted it to run fast on ARM hardware, since that is what the Oculus Quest uses internally.

The second use case is for some water/wave simulation that runs at audio rate, for some weird but convincingly physical sounding audio synthesis.

Cheers, Marcel

ZalgoSoft commented 2 years ago

fully unsable pile of useless code. killed 6 hours tying even run just single demo

marcel303 commented 2 years ago

Hi @ZalgoSoft, which platform / OS did you try to build for?

ZalgoSoft commented 2 years ago

Hi @ZalgoSoft, which platform / OS did you try to build for?

windows 10 64 VS2019 community. I'm looking for flexible node flow engine which will suport and process high data flow , about 10 MSPS For now I run ImNodes wich is part of ImGUI and some time ago discovered DirectShow code library, wich is acceptable for me. What I really need is a good real time data flow manager/orchestrator/arbiter with circular buffers etc. I wrote my own lightweight data flow framework similiar to directshow and Jack but I lack ability to add nodes, heh.

marcel303 commented 2 years ago

Hi @ZalgoSoft,

The first problem you probably encountered is that some of the libraries depend on 32 bit statically compiled binaries (i.e. ff peg/avcodec), and no 64 bit version exists. I tried to change the generate script to tell Cmake to produce a project file targeting x86. However it seems this is not the only issue, as I found a weird compile error trying compile some code which uses an std::unordered_map..

I will research some more next when when I have time, as I do want to tackle this problem.

For what it's worth.. everything works well using VS2017. It seems VS2019 switched to 64 bits by default and some compiler behaviour also changed

marcel303 commented 2 years ago

@ZalgoSoft By the way, are you looking to do work on GPU or the CPU at 10Ms/s?

marcel303 commented 2 years ago

@ZalgoSoft I've updated the build, generate and archive scripts to explicitly tell CMake to generate project files for a 32-bit target. This fixes most of the issues. I've also addressed a few compile errors that only seem to happen with VS2019+. You should be able to build & run most of the apps and demos with these changes, including the graph system. I'm still trying to resolve a compile issue with one of the third party dependencies. I'm not sure why, but std::unordered_map gives a compile error on a simple map using std::string's when compiling ImGuiColorTextEdit.

ZalgoSoft commented 2 years ago

@marcel303 thank you a lot, will try soon your code. All I need is audio processing / dataflow of your framework, I trying to make an data flow wich use advantages of GPGPU processing of audio or radio signals. So I choose openCL as most versatile solution, before this I did successfull computations on CUDA, but requirenment of today need for more generic solution. Question is about 1-10Msps of byte/complex/float flow , conversions, enchancement, filtering and visual rendering of waterfall and spectrum of signal