This this takes care of D_DrawSpans8, D_DrawSpans16, D_DrawSpans16T, D_DrawZSpans and Turbulent8. I had to make some changes to the C sources too:
D_WarpScreen had an assembly version, but I found that by introducing a couple local variables I was able to get the compiler to produce the same output, so I did that instead.
For Turbulent8 it wasn't enough to replace D_DrawTurbulent8Span and co., as the assembly version uses a lot from inlining for speed and I wanted to avoid the memcpy mess with scanList. I ended up splitting it into two parts (one translucent and one normal), so the assembly code doesn't have to poke into the client_state_t for the time, which is different between Hexen II and HexenWorld.
This this takes care of D_DrawSpans8, D_DrawSpans16, D_DrawSpans16T, D_DrawZSpans and Turbulent8. I had to make some changes to the C sources too: