Closed lizthegrey closed 1 year ago
pinging again @pierrec and @greatroar for thoughts. this is quite bad IMO.
Total: 1174.89s 1174.89s (flat, cum) 10.73%
31 . . // func decodeBlock(dst, src, dict []byte) int
32 . . TEXT ·decodeBlock(SB), NOFRAME+NOSPLIT, $0-80
[...]
183 . . copyMatchTry8:
184 . . // Copy doublewords if both len and offset are at least eight.
185 . . // A 16-at-a-time loop doesn't provide a further speedup.
186 30.24s 30.24s CMP $8, len
187 2.34s 2.34s CCMP HS, offset, $8, $0
188 250ms 250ms BLO copyMatchLoop1
189 . .
190 4.98s 4.98s AND $7, len, lenRem
191 9.83s 9.83s SUB $8, len
192 . . copyMatchLoop8:
193 3.85s 3.85s MOVD.P 8(match), tmp1
194 62.76s 62.76s MOVD.P tmp1, 8(dst)
195 2.55s 2.55s SUBS $8, len
196 1.93s 1.93s BPL copyMatchLoop8
197 . .
198 910ms 910ms MOVD (match)(len), tmp2 // match+len == match+lenRem-8.
199 24.24s 24.24s ADD lenRem, dst
200 60ms 60ms MOVD $0, len
201 440ms 440ms MOVD tmp2, -8(dst)
202 550ms 550ms B copyMatchDone
203 . .
204 . . copyMatchLoop1:
205 . . // Byte-at-a-time copy for small offsets.
206 132.84s 132.84s MOVBU.P 1(match), tmp2
207 177.05s 177.05s MOVB.P tmp2, 1(dst)
208 8.42s 8.42s SUBS $1, len
209 5.27s 5.27s BNE copyMatchLoop1
210 . .
211 . . copyMatchDone:
212 18.42s 18.42s CMP src, srcend
213 12.36s 12.36s BNE loop
Hi all, I know this is closed but some work over the holidays lead me to dig into this more and I wanted to get y'all thoughts on #215, directly relevant to this sort of unrolling. Thanks!
https://github.com/pierrec/lz4/commit/1cbdd8169b0510935efd9164dbbea4dcda897fb6 appears to have removed the unrolled 4x copy, meaning the finishing copy after the remainder is less than 8 bytes long may do up to 7x unaligned individual byte copies, rather than the maximum of 3x individual unaligned byte copies from before. Profiling shows that we're seeing 10% of all time spent in the lz4 library being spent on the one line at https://github.com/pierrec/lz4/blob/v4/internal/lz4block/decode_arm64.s#L206
I suspect we need to put that aligned 4x unrolled access back, do you concur, @greatroar?