individual residua for multi-shift solver

I hadn't noticed this issue until recently but it seems I've been assigned to work on it, so I'll make a few points. First off, QudaInvertParam already has member variables tol and tol_offset, where tol is a double and tol_offset is an array of doubles, one for each offset. As far as I can see, tol_offset is completely disregarded at the moment, In Balint's example, the system with the smallest shift needs to be solved less accurately than larger-shift systems. So one could, for example, run multi-shift until the tolerance for the lowest shift had been reached and then polish the higher-shift results individually until they reached their respective target residuals. On the other hand, with the HISQ action, you usually want to solve the low-shift results to a higher accuracy. The high-shift results don't incorporate the quark-mass dependence of the naik term, and they need to be polished with the correct operator. One way of making the multi-shift solver more flexible would be to use tol to specify a base tolerance for the standard multi-shift inverter and tol_offset to specify a set of residuals for the polishing step. This would be a trivial change to the mixed-precision multi-shift solver. Of course, more flexibility can mean more confusion. Any comments?

lattice / quda

individual residua for multi-shift solver #8