With this we've reached the point where polynomial operations can't really be optimized much more (without some very different approach). Sampling/packing is now taking up a good chunk of the processing time.
poly_S3_mul could be implemented using the alternative approach mentioned in poly_s3_mul.py but would only bring a small/modest speedup and adds quite a bit of additional code (and a few weeks of work). It's exclusively used in decryption (which is already the fastest operation) and would likely speed it up between 5-10%.
About an 11 times improvement.
With this we've reached the point where polynomial operations can't really be optimized much more (without some very different approach). Sampling/packing is now taking up a good chunk of the processing time.
poly_S3_mul could be implemented using the alternative approach mentioned in poly_s3_mul.py but would only bring a small/modest speedup and adds quite a bit of additional code (and a few weeks of work). It's exclusively used in decryption (which is already the fastest operation) and would likely speed it up between 5-10%.