Using Multi Threads for Aggregating Signatures seems working faster than the regular Aggregate function on IoT devices

Coresummer commented 4 years ago

There seems a possibility that we can **make signatures aggregation faster** on Raspberry pi kind of low spec & multi-core devices by using multi threading trick which built in golang itself.
The idea is simple, instead of using for{} loop like basic Aggregation() dose, we may use a golang channel as the function input and call C.blsSignatureAdd() by using goroutine within the function. So, the algorithm would be

if there are more than two signatures in the channel then we call goroutine C.blsSignatureAdd(sig1.v,sig2v) and push back the result sig1 back to the channel after its done. Since the proportion of multi threading overhead and C.blsSignatureAdd() is different on multi-platform, this method works faster on low spec & multi-core cpu. I'll provide a implementation example by using pull request and evaluated data for both Aggregate() & AggregateMT(MT stands for Multi-threading) running on raspbery pi 4. Welcome to have a look and test yourself.

Regards

herumi commented 4 years ago

I'm sorry for the very long wait.

I merged your pull request and modified it a little at https://github.com/herumi/bls-go-binary/tree/Coresummer-dev . I moved your code to bls_mt.go. I took some benchmarks, It shows slower than the original Aggregate. Am I using it incorrectly?

On Core i7-8700 @ 3.2GHz

% go test -bench Aggregate ./bls
sec:910bdbfa28e5c9b393a167c4ecc15a3a02362f53d3fd479e50ee9d5edcb33343
pub:8902ee34d9c96b16b496f3d607b3fd6723ff475a1f2e7eb1e6e2bc6efd5f56026280598b369d51e678a27323bf0bb4052c7c8a7affd6f576784d2bc952c257b6273ff20b8d59014d6983f41eb09e21ce4b5dd346dec240041c97c0867288a592
0. sign(abc)=3ac19c1b397fd2c1deb095906e1d7b233c1b1f420807d748e9a85834c1ecb49e82c0edd37f9bcffad0b160c0a7f63f85
1. sign(def)=c31f042ec82e72102c18516f7a001fb9d3164c07adbba66e14bf0d9add690d23e48dcff13a8fd2abe581d03c6fd7b494
2. sign(123)=43e6a0b5531f0caa4735783b4b44c35f0fdae3a0085f3432101df2b25fec45daada40eabc29f7e8cf672fc3de1d02901
goos: linux
goarch: amd64
pkg: github.com/herumi/bls-go-binary/bls
BenchmarkAggregate-12           1000000000               0.000555 ns/op
BenchmarkAggregateMT-12         1000000000               0.000787 ns/op
PASS
ok      github.com/herumi/bls-go-binary/bls     2.721s

On Xeon Platinum 8280 @ 2.7GHz

% go test -bench Aggregate ./bls
sec:5fe7f8ae3e1cd51b1e9191d4415e759a6fa898355eec473d2a5f4958e184bd5f
pub:b5914f4b3b8f9f6b4c47d42fe21f853607d7053c9698795f184ef5cb5515731b38f5cf30b19507472169fbdf2c65c715b6b50a5669e04de95b27aa740c6386d242df3026208aa69a01350ffeaa52151d8e76596a07a231f3c3ffe561c80a150a
0. sign(abc)=a473e6dcfa6760473175b1a6f36e7cafa439089aafe39ff4e2f153e17bf088c675c2aa3866b8a828297cf5eb20d1f608
1. sign(def)=8d57f4e49955168c89747b3c46c07a8810981f3bd8b960518e8c1000d7f7f13ffcab035bbc41b1a600ff99afeb7c438c
2. sign(123)=47bd4a572bb566d1adabef91e8b01d09c3e24068767af323264df01254966cc9b6046ccf8383b3af903a95c476f08993
goos: linux
goarch: amd64
pkg: github.com/herumi/bls-go-binary/bls
BenchmarkAggregate-112          1000000000               0.000655 ns/op
BenchmarkAggregateMT-112        1000000000               0.00150 ns/op
PASS
ok      github.com/herumi/bls-go-binary/bls     3.502s

Coresummer commented 4 years ago

So much appreciation for the benchmark and sorry for my late reply. I tested on i9-9900, it seems slower than the regular Aggregate function too. Although, on the Raspberry Pi4, when there are more than 1000 signatures. The AggregateMT function gave me faster result than the regular one. I'm currently still trying to figure out the reason why, but no clue yet. I'm wondering if you can try the same when you got time, just in case if you don't mind.

herumi / bls-go-binary

Using Multi Threads for Aggregating Signatures seems working faster than the regular Aggregate function on IoT devices #3