viennacl / viennacl-dev

Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.
Other
281 stars 89 forks source link

Add ARM Mali to built-in database #231

Open whaleford opened 7 years ago

whaleford commented 7 years ago

Currently, ARM Mali GPUs are not present in the built-in database, causing the likely-unoptimized fallback parameters to be selected when running on such GPUs. May I request for the addition of parameters optimized for Mali T880 MP12? Or if that's not possible, I'd like to know how these parameters are generated so I can try and develop them myself.

I've checked previous issues and git history to try and figure out how these were generated and found the following:

23 #24 #26 - These mention the kernel generator, which, if I understand correctly, use the parameters in the built-in database to generate kernels optimized for the current GPU. However, I could not find any mention of how the parameters themselves were generated. Closest I can see is the commit message in d2ef9b2 fixing #26 where it's mentioned that in this particular case, they were generated from manual tweaking. There is, however, mention of an optimizer, which I've been unable to find in the documentation but would like to know more about.

27 - I think this mentions what used to be the parameter generator, a.k.a. autotuner? As mentioned there, it's been superseded by the kernel generator + device database and thus removed, but upon downloading old versions of ViennaCL I found files for an autotuner in version 1.5.2. I guess I could try and use this old version, using the autotuner examples as a guide. But hoping to find an autotuner for the latest version, I searched some more and found commit 237f3d9, where it says that the autotuner had been migrated to Python in an external repo, which I've been unable to find so far but would also like to know more about.

(Do let me know if I got anything wrong--I may have several years of experience as a systems/embedded software engineer but I'm new to the world of BLAS libraries.)

karlrupp commented 7 years ago

Thanks, @wilf0rd , for the suggestion. Indeed, there are currently no dedicated profiles for Mali GPUs. We had an evaluation of single board computers in the summer of 2015, with the result that the default parameters were just fine (though at a rather low performance point overall). A bunch of new hardware came out since, so an update to all this would indeed be appropriate.

I assume your Mali T880 MP12 is part of a Samsung Galaxy S7?

whaleford commented 7 years ago

That's right, it's part of a Galaxy S7.

Anyway, I've tried tweaking the values in fallback.hpp and have managed to speed up AlexNet a little, by around 30%. I'll probably try tweaking matrix_product_template.hpp next to better suit the characteristics of Mali T880. As I'm doing this manually, an autotuner will be greatly appreciated.

psyhtest commented 7 years ago

@wilf0rd Out of interest, have you tried CLBlast? We've got pretty good speedups (~3x) by tuning CLBlast on several Mali devices. (We can share the results if you are interested.)

whaleford commented 7 years ago

@psyhtest I've only just managed to build CLBlast now after wrestling with toolchain files, but now that I've done so I'm quite interested in CLBlast. May I see your results?