Open tq-z opened 2 years ago
Dear @tq-z,
Well-spotted. This is a bug. I'll fix ASAP and release a new version (@stevenweaver -- please update our dependancies).
Best, Sergei
Thank you for fixing this so quickly, @spond and @stevenweaver!
Dear @niemasd,
My pleasure. I am glad @tq-z found the issue. The manual unrolling was actually to allow some degree of CPU parallelism in this "chokepoint" loop with unavoidable data dependencies. In my hands, achieves 20-40% speedups.
Best, Sergei
Wow, 20-40% speedup is exciting! I look forward to trying it out!
Thanks, -Niema
Hi,
I noticed the v1.08 unrolls the comparison loops manually here: https://github.com/veg/tn93/blob/55565df55a9215bb1d6939f3d971988a7de12138/src/tn93_shared.cc#L584-L605. The continue at line 592 may make the comparison at line 597-604 skipped. Is it a new feature or bug?
Thanks, Tianqi