Open zoff99 opened 4 years ago
I believe that these functions are already highly optimized. Why do you think they can be improved?
The default malloc alignment ought to be sensible, but feel free to align it further (256 bytes should be more then enough).
force_turbo=1
to config.txt.@JamesH65 because it seems to slow. rpi4 should be able to memcpy some GB/s
Are you sure that your timer functions are accurate - I'm not familiar with __utimer_start and google seems to think it an android function. Might be worth trying stuff from this page.https://stackoverflow.com/questions/6749621/how-to-create-a-high-resolution-timer-in-linux-to-measure-program-performance
__utimer_start is simply a wrapper around gettimeofday, which is a POSIX function.
I think it's important to establish the clock speeds, and any memory bandwidth taken up by the display, etc.
the function is included in the source attached here, its just a wrapper. with a timespan of 30ms gettimeofday is accurate enough.
can somebody try the attached source on their rpi and just post the result?
Tried it, increased the number of copies to 10, and it completed in 34ms, so 3.4ms per copy. With 20 it was 60ms, so 3ms per copy. That's at 600MHz on a Pi4.
With performance governor enabled (i.e. with the ARMs at 1.5GHz) I get a fairly consistent 3.3GB/s bandwidth. The powersave governor (600MHz) drops performance to 1.46GB/s. On-demand will yield a result somewhere between the two.
Which is about what I was seeing (approx 2.5GBits/s)
GB/s is Giga Bytes per second.
N.B. My results were with the screen blanked - an active display will eat into that bandwidth (1080p 60-70 drops the bandwidth to about 3.1GB/s.
Apologies, I've use the wrong units - that should read 2.5GBytes/s.
N.B.2. My figures are for reading and writing, so you could naively double the results. In practise, read and write speeds are different, so two separate figures is more useful.
thanks guys for your results. i will try to change to the performance governor
$ sudo sh -c "echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor"
should do it.
A year later since it has not been closed I found this when researching why the memory bandwidth is resulting in bad webGL/OpenGl performance for large windows. I.E. it scales down dramatically as the window grows even though a 1080p swap should only result in about 0.5 GB/s and the memory bandwidth should be 4-5GB/s
Maybe you could boot your PI in console only and rerun the test?
Describe the bug memcpy take a long time (see example program) can i do something to speed this up? alignment?
To reproduce make a && ./a
Expected behaviour hopfully be faster
Actual behaviour takes up to 30ms on a rpi4
System Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:
cat /etc/rpi-issue
)? Raspberry Pi reference 2020-02-14 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, e577677b623b577f2a0ec7cfaffc3c27da005da3, stage2vcgencmd version
)?uname -a
)?Additional context
save as a.c