philburk / hmsl

Hierarchical Music Specification Language, Forth tools for experimental music from the 1980's
Apache License 2.0
98 stars 9 forks source link

Porting HMSL to an FPGA J1. #167

Closed PythonLinks closed 2 months ago

PythonLinks commented 12 months ago

I am delighted to hear about this library.

I am working on an FPGA for music synthesis using the $35 Pico Ice board. https://pico-ice.tinyvision.ai/

The advantage of an FPGA, is that it has 8 multipliers, allowing for far more sine waves, meaning more instruments, with richer sounds.

I invite you to join the Pico-Ice Forth Music synthesis channel on Discord. https://discord.gg/RcpkRWSgbn

philburk commented 11 months ago

Thanks for the link to pico-ice. Sounds like a cool device. How does it sound?

For synthesis, how does the performance compare with using SIMD on a Raspberry Pi 4?

PythonLinks commented 11 months ago

The ICE40 has 8 16 * 16 bit multipliers. Should be able to run 4 Forth cores.
I do not know anything about the R Pi. 4, but AMY says that they can run 64 sine wave generators on ESP MCUs but only 2 on the RP2040, which has problematic multiplier support. So RP2040 on a board with an FPGA is most interesting, you get the C libraries, and the application specific abilities of an FPGA.

PythonLinks commented 11 months ago

Sorry I missed this question.

How does it sound?

So far I have only 1 forth core running. The next release will have it connected to and I2S audio output, at which point we can start generting sounds.

philburk commented 11 months ago

they can run 64 sine wave generators on ESP MCUs but only 2 on the RP2040

The RPi4 has a quad core Cortex A72 that can run at up to 1.8 GHz. That should be able to generate hundreds, if not thousands, of sine waves at 44100 Hz on a single core. If I get time I will run some benchmarks for cordic, wavetable and sinf() implementations.

PythonLinks commented 11 months ago

That sounds great. until you read the fine print.

BCM2711 provides a 1MB system L2 cache, which is used primarily by the GPU. Accesses to memory are routed either via or around the L2 cache depending on the address range being used.

meaning that for program memory, the data access is quiet slow, spi out to flash or ram. Plus a C compiler will inline code, bloating the memory. So then what good is a gigahz cpu. No thank you. But thanks for letting me know the apparent competition.

philburk commented 11 months ago

That sounds great. until you read the fine print.

The BCM2711 has a 32KB L1 D cache per core. That's big enough to fit a sine wave table and a lot of synth state.

I wrote a benchmark in C that compared three sine wave generation techniques on an RPi4. I then increased the number of sine oscillators until I hit 90% CPU. That forces the CPU clock to maximum. I set the oscillators to random frequencies, mixed the result and played it using PortAudio. So I know it is playing OK in real-time.

Method vs number of sine oscillators with differing compiler optimization flags.

method -O3 -Ofast
sinf() 736 747
lookup table 2053 2074
taylor expansion 1181 2564

I think these numbers could be increased if you used SIMD intrinsics or multiple cores. Also note the new RPi 5 is supposed to be 2X faster than an RPi4.

One advantage of using a regular CPU is that you can work in float instead of fixed point. Also you have a full processor to do MIDI control, etc.

PythonLinks commented 11 months ago

What an awesome result. You get my highest compliment, you shifted my thinking. Even with a many core 50Mhz machine very hard to compete on computer cycles with a 2 Ghz machine. You would need 40 cores.

On the other hand the memory is really the bottleneck in processors. Your demo works great on a tiny amount of RAM. But start switching in and out a real application and ghz machines run into trouble. Worse then that, to keep he pipeline full, they inline subroutines, and that takes up even more memory, requiring more stalls for memory fetches.

I think with Forth and an FPGA, I can really shrink the memory, so that a ton of stuff fits in onchip ram. 1 Mbit on the ICE40, 6Mbits on other boards.

Anyhow I do appreciate you analysis. Like I said, it shifted my thinking.

philburk commented 2 months ago

Thanks for the insights.