Closed KatyushaScarlet closed 3 months ago
To bump the dependency version of templexxx/xorsimd for kcp-go, enabling the kcp-go to compile and run directly on linux/loong64 platform.
Loong64 has been officially supported since Go 1.19. Approximately 6 months ago, templexxx/cpu also added support for the loong64 architecture. Last week, templexxx/xorsimd upgraded its cpu package dependency to v0.1.1 to support the loong64 architecture.
Although Golang still lacks support for SIMD for the loong64 ISA (such as lsx/lasx), this pull request allows kcp-go to run basic functions on loong64 platform.
Tested on Loongson 3A5000M, 3A6000 and 3C5000 (without SIMD).
2024/08/15 20:26:29 beginning tests, encryption:salsa20, fec:10/3 goos: linux goarch: loong64 pkg: github.com/xtaci/kcp-go/v5 cpu: Loongson-3A5000M @ 2000.00MHz BenchmarkSM4-4 15994 75053 ns/op 39.97 MB/s 0 B/op 0 allocs/op BenchmarkAES128-4 26928 44547 ns/op 67.34 MB/s 0 B/op 0 allocs/op BenchmarkAES192-4 23096 51940 ns/op 57.76 MB/s 0 B/op 0 allocs/op BenchmarkAES256-4 20258 59444 ns/op 50.47 MB/s 0 B/op 0 allocs/op BenchmarkTEA-4 28941 41438 ns/op 72.40 MB/s 0 B/op 0 allocs/op BenchmarkXOR-4 89575 12948 ns/op 231.70 MB/s 0 B/op 0 allocs/op BenchmarkBlowfish-4 16771 70104 ns/op 42.79 MB/s 0 B/op 0 allocs/op BenchmarkNone-4 2834565 424.1 ns/op 7073.48 MB/s 0 B/op 0 allocs/op BenchmarkCast5-4 18360 65963 ns/op 45.48 MB/s 0 B/op 0 allocs/op Benchmark3DES-4 3482 340804 ns/op 8.80 MB/s 0 B/op 0 allocs/op BenchmarkTwofish-4 5736 207029 ns/op 14.49 MB/s 0 B/op 0 allocs/op BenchmarkXTEA-4 12475 90635 ns/op 33.10 MB/s 0 B/op 0 allocs/op BenchmarkSalsa20-4 61389 19560 ns/op 153.37 MB/s 0 B/op 0 allocs/op BenchmarkCRC32-4 745849 1608 ns/op 636.64 MB/s BenchmarkCsprngSystem-4 1000000 1081 ns/op 14.80 MB/s BenchmarkCsprngMD5-4 2733994 438.8 ns/op 36.47 MB/s BenchmarkCsprngSHA1-4 1460905 822.2 ns/op 24.32 MB/s BenchmarkCsprngNonceMD5-4 2645143 457.6 ns/op 34.97 MB/s BenchmarkCsprngNonceAES128-4 4987132 240.3 ns/op 66.58 MB/s BenchmarkFECDecode-4 280056 4274 ns/op 350.95 MB/s 143 B/op 1 allocs/op BenchmarkFECEncode-4 122692 9786 ns/op 153.28 MB/s 0 B/op 0 allocs/op BenchmarkFlush-4 160636 7467 ns/op 0 B/op 0 allocs/op BenchmarkEchoSpeed4K-4 2462 474912 ns/op 8.62 MB/s 5968 B/op 155 allocs/op BenchmarkEchoSpeed64K-4 298 3945359 ns/op 16.61 MB/s 86934 B/op 1918 allocs/op BenchmarkEchoSpeed512K-4 39 33693713 ns/op 15.56 MB/s 764641 B/op 14962 allocs/op BenchmarkEchoSpeed1M-4 16 65204302 ns/op 16.08 MB/s 1729531 B/op 30118 allocs/op BenchmarkSinkSpeed4K-4 32742 37404 ns/op 109.51 MB/s 2149 B/op 48 allocs/op BenchmarkSinkSpeed64K-4 1672 661730 ns/op 99.04 MB/s 31827 B/op 721 allocs/op BenchmarkSinkSpeed256K-4 253 5009177 ns/op 104.67 MB/s 249385 B/op 5673 allocs/op BenchmarkSinkSpeed1M-4 100 10434855 ns/op 100.49 MB/s 550046 B/op 11231 allocs/op PASS ok github.com/xtaci/kcp-go/v5 45.682s
2024/08/15 18:28:46 beginning tests, encryption:salsa20, fec:10/3 goos: linux goarch: loong64 pkg: github.com/xtaci/kcp-go/v5 cpu: Loongson-3A6000-HV @ 2500.00MHz BenchmarkSM4-8 21512 55799 ns/op 53.76 MB/s 0 B/op 0 allocs/op BenchmarkAES128-8 43020 27940 ns/op 107.37 MB/s 0 B/op 0 allocs/op BenchmarkAES192-8 36678 32887 ns/op 91.22 MB/s 0 B/op 0 allocs/op BenchmarkAES256-8 32206 37500 ns/op 80.00 MB/s 0 B/op 0 allocs/op BenchmarkTEA-8 57771 20773 ns/op 144.42 MB/s 0 B/op 0 allocs/op BenchmarkXOR-8 165253 7254 ns/op 413.54 MB/s 0 B/op 0 allocs/op BenchmarkBlowfish-8 28609 41943 ns/op 71.53 MB/s 0 B/op 0 allocs/op BenchmarkNone-8 3741582 320.9 ns/op 9349.48 MB/s 0 B/op 0 allocs/op BenchmarkCast5-8 25393 47226 ns/op 63.52 MB/s 0 B/op 0 allocs/op Benchmark3DES-8 5823 204815 ns/op 14.65 MB/s 0 B/op 0 allocs/op BenchmarkTwofish-8 7682 156096 ns/op 19.22 MB/s 0 B/op 0 allocs/op BenchmarkXTEA-8 16908 70940 ns/op 42.29 MB/s 0 B/op 0 allocs/op BenchmarkSalsa20-8 97887 12251 ns/op 244.87 MB/s 0 B/op 0 allocs/op BenchmarkCRC32-8 1306702 918.3 ns/op 1115.10 MB/s BenchmarkCsprngSystem-8 1701952 697.5 ns/op 22.94 MB/s BenchmarkCsprngMD5-8 3892052 308.3 ns/op 51.89 MB/s BenchmarkCsprngSHA1-8 2236068 537.8 ns/op 37.19 MB/s BenchmarkCsprngNonceMD5-8 3749058 320.2 ns/op 49.97 MB/s BenchmarkCsprngNonceAES128-8 7970253 150.2 ns/op 106.53 MB/s BenchmarkFECDecode-8 406660 2878 ns/op 521.14 MB/s 142 B/op 1 allocs/op BenchmarkFECEncode-8 200253 5978 ns/op 250.91 MB/s 0 B/op 0 allocs/op BenchmarkFlush-8 238231 5051 ns/op 0 B/op 0 allocs/op BenchmarkEchoSpeed4K-8 4293 278480 ns/op 14.71 MB/s 5827 B/op 151 allocs/op BenchmarkEchoSpeed64K-8 500 2388665 ns/op 27.44 MB/s 91838 B/op 1995 allocs/op BenchmarkEchoSpeed512K-8 42 30345375 ns/op 17.28 MB/s 801540 B/op 15955 allocs/op BenchmarkEchoSpeed1M-8 22 53752814 ns/op 19.51 MB/s 1668689 B/op 31758 allocs/op BenchmarkSinkSpeed4K-8 39192 30965 ns/op 132.28 MB/s 2061 B/op 46 allocs/op BenchmarkSinkSpeed64K-8 1983 540378 ns/op 121.28 MB/s 29868 B/op 688 allocs/op BenchmarkSinkSpeed256K-8 246 4316135 ns/op 121.47 MB/s 245894 B/op 5467 allocs/op BenchmarkSinkSpeed1M-8 135 8492565 ns/op 123.47 MB/s 499558 B/op 10795 allocs/op PASS ok github.com/xtaci/kcp-go/v5 48.693s
2024/08/15 18:32:23 beginning tests, encryption:salsa20, fec:10/3 goos: linux goarch: loong64 pkg: github.com/xtaci/kcp-go/v5 cpu: Loongson-3C5000 @ 2200.00MHz BenchmarkSM4-16 17607 68083 ns/op 44.06 MB/s 0 B/op 0 allocs/op BenchmarkAES128-16 29734 40523 ns/op 74.03 MB/s 0 B/op 0 allocs/op BenchmarkAES192-16 25473 47251 ns/op 63.49 MB/s 0 B/op 0 allocs/op BenchmarkAES256-16 22282 53946 ns/op 55.61 MB/s 0 B/op 0 allocs/op BenchmarkTEA-16 31908 35692 ns/op 84.05 MB/s 0 B/op 0 allocs/op BenchmarkXOR-16 102637 11688 ns/op 256.67 MB/s 0 B/op 0 allocs/op BenchmarkBlowfish-16 18894 63515 ns/op 47.23 MB/s 0 B/op 0 allocs/op BenchmarkNone-16 3121153 384.5 ns/op 7802.48 MB/s 0 B/op 0 allocs/op BenchmarkCast5-16 20019 59698 ns/op 50.25 MB/s 0 B/op 0 allocs/op Benchmark3DES-16 3832 282964 ns/op 10.60 MB/s 0 B/op 0 allocs/op BenchmarkTwofish-16 6334 188647 ns/op 15.90 MB/s 0 B/op 0 allocs/op BenchmarkXTEA-16 13635 87983 ns/op 34.10 MB/s 0 B/op 0 allocs/op BenchmarkSalsa20-16 67532 17779 ns/op 168.74 MB/s 0 B/op 0 allocs/op BenchmarkCRC32-16 819600 1462 ns/op 700.46 MB/s BenchmarkCsprngSystem-16 1239736 954.7 ns/op 16.76 MB/s BenchmarkCsprngMD5-16 3010318 398.9 ns/op 40.11 MB/s BenchmarkCsprngSHA1-16 1607418 747.2 ns/op 26.77 MB/s BenchmarkCsprngNonceMD5-16 2904398 423.4 ns/op 37.79 MB/s BenchmarkCsprngNonceAES128-16 5501032 217.8 ns/op 73.47 MB/s BenchmarkFECDecode-16 308180 3908 ns/op 383.83 MB/s 143 B/op 1 allocs/op BenchmarkFECEncode-16 134910 8880 ns/op 168.93 MB/s 0 B/op 0 allocs/op BenchmarkFlush-16 176893 6781 ns/op 0 B/op 0 allocs/op BenchmarkEchoSpeed4K-16 2584 444047 ns/op 9.22 MB/s 6018 B/op 157 allocs/op BenchmarkEchoSpeed64K-16 366 3279983 ns/op 19.98 MB/s 95296 B/op 2058 allocs/op BenchmarkEchoSpeed512K-16 40 29910766 ns/op 17.53 MB/s 809910 B/op 16080 allocs/op BenchmarkEchoSpeed1M-16 16 62569922 ns/op 16.76 MB/s 1754069 B/op 31635 allocs/op BenchmarkSinkSpeed4K-16 26877 38259 ns/op 107.06 MB/s 2190 B/op 51 allocs/op BenchmarkSinkSpeed64K-16 1762 610495 ns/op 107.35 MB/s 32351 B/op 764 allocs/op BenchmarkSinkSpeed256K-16 268 4393501 ns/op 119.33 MB/s 265984 B/op 6040 allocs/op BenchmarkSinkSpeed1M-16 122 8842404 ns/op 118.58 MB/s 550016 B/op 12051 allocs/op PASS ok github.com/xtaci/kcp-go/v5 47.766s
Reason
To bump the dependency version of templexxx/xorsimd for kcp-go, enabling the kcp-go to compile and run directly on linux/loong64 platform.
Loong64 has been officially supported since Go 1.19. Approximately 6 months ago, templexxx/cpu also added support for the loong64 architecture. Last week, templexxx/xorsimd upgraded its cpu package dependency to v0.1.1 to support the loong64 architecture.
Although Golang still lacks support for SIMD for the loong64 ISA (such as lsx/lasx), this pull request allows kcp-go to run basic functions on loong64 platform.
Tested on Loongson 3A5000M, 3A6000 and 3C5000 (without SIMD).
Test result