Closed anowell closed 7 years ago
Hi, Thanks for all the help, I've got the copy up. https://algorithmia.com/algorithms/zza/EnhanceResolution I already have some extensions in the works!
A couple of questions regarding optimizing compilation though: What minimum hardware is this generally running on? single/multicore, SSE2/AVX/AVX2 availability. Is it possible to set compiler flags? Are there any already in the RUSTFLAGS env variable?
Cheers!
Sweet! I've effectively de-listed my implementation from the marketplace (still exists, but soon it will stop showing up in listings).
We don't currently allow setting compiler flags yourself, but your questions reminded me that while we did set some CFLAGS/CXXFLAGS, I neglected to set RUSTFLAGS which if I understand correctly defaults to pentium4 arch for x86_64 getting you SSE2. I can easily set --target-cpu
to core2 (matching our CFLAGS -march) which adds SSE3, but the public marketplace does currently leverage multicore haswell/broadwell CPUs, so I'll check with the team on feasibility of bumping that to haswell (getting you AVX/AVX2).
Question: what sort of improvements do you see with SSE2/AVX/AVX2? Using my i5 broadwell laptop and a 1MP photo with rusty_sr, I can't see any noticeable difference in perf between the default, pentium4
, haswell
, or native
(broadwell) cpu-target. Perhaps the perf benefits really only affect training?
Thanks, for the marketplace hardware/config info!
Going from target-cpu=snb to no flags I'm getting a ~50% reduction in matrixmultiply throughput (AVX to SSE2), but only a ~15% reduction in rusty_sr throughput. Goings from AVX to AVX2 is theoretically another doubling in matmult performance, but the llvm codegen for FMA instructions has been a bit flaky.
There is some inverse Amdahl's going on here, which I think is partially down to the lack of adaptive lowering from convolution to matmult, which currently produces very long, thin (inefficient) matrices. Anyway, I can measure a difference, but its not going to be huge for rusty_sr until I fix a bunch of other stuff.
So I finally managed to get this live on Algorithmia:
If you're interested in owning/maintaining it, I'm happy to help make that happen. Our process of transferring algorithms needs a bit of work, but for now, I have a mirror of the repo on github, so it should be possible to:
git clone https://git.algorithmia.com/git/USERNAME/ALGONAME.git
(and feel free to just close this if you're not interesting in owning/maintaining it on Algorithmia)