Explain the Xoroshiro MT construction?

https://github.com/pitdicker/small-rngs/blob/d74e117d83fec27e52bf5dd8122f822b09d52916/src/xoroshiro_mt.rs#L37

Two things I don't understand about this construction:

The shift and rotate constants do not correspond to the Xoroshiro128+ PRNG — why are they different?
You have used a 64-bit PRNG to implement next_u32 then called it twice for next_u64. Obviously discarding more bits can improve quality at the cost of performance, but why not just implement next_u64 directly — that would be much faster, right?

Did this design come from O'Neill's work? I don't see an implementation on her site.

Also worth noting is Xoroshiro128** which is similar, but uses the same construction critiqued here to "multiply" the result. I'm not really sure what to make of that, other than this "fast multiplication" has worse avalanche than a regular 64-bit multiplication.

pitdicker / small-rngs

Explain the Xoroshiro MT construction? #4