mn416 / QPULib

Language and compiler for the Raspberry Pi GPU
Other
429 stars 64 forks source link

Suggestion on transfer big chunk of memory between CPU and QPU? #80

Closed lyogavin closed 4 years ago

lyogavin commented 4 years ago

We achieved 5+x speed up with QPULib for the main calculation part!

But we have to keep transfering data back and forth between CPU and QPU, which is consuming half of the total time.

Following the examples code, we can only do this by for{*(shared_array_ptr) = a;}, which seems too weak.

I think it's some fundermental operation that woth additional optimization. Any suggestion how to do it efficiently? Like what they do here: https://github.com/nineties/py-videocore/blob/f2a0ef174a936f7a6e11a9e24f34fb555acb84c7/videocore/assembler.py#L692

lyogavin commented 4 years ago

Looks like we can directly memcpy the arm_base pointer of the mmap'd memory. about 5x faster than copy by loop.