There's a ton of good stuff between sse2 and avx, including pshuffb to swizzle 16 bytes in an arbitrary way and pcmpXstrX, the swiss-army knife of byte processing. Also, while it isn't really simd, the crc32c instruction is in sse4.2 for some reason. Not sure if that is in or out of scope for this project, but worth considering.
There's a ton of good stuff between sse2 and avx, including pshuffb to swizzle 16 bytes in an arbitrary way and pcmpXstrX, the swiss-army knife of byte processing. Also, while it isn't really simd, the crc32c instruction is in sse4.2 for some reason. Not sure if that is in or out of scope for this project, but worth considering.