Tunecache does not save CUDA version

mathiaswagner commented 10 years ago

It seems like the tune cache does not save the CUDA version used to compile QUDA. As register usage of Kernels is likely to change with different CUDA versions using an tune cache for QUDA compiled with a different CUDA version might not result in optimal results.

It might be good to add the CUDA / nvcc version to the QUDA_HASH.

maddyscientist commented 10 years ago

Good idea Mathias. Can you add this to the quda-0.7 branch?

Sent from my iPhone

On Jun 10, 2014, at 11:47, "Mathias Wagner" notifications@github.com<mailto:notifications@github.com> wrote:

It seems like the tune cache does not save the CUDA version used to compile QUDA. As register usage of Kernels is likely to change with different CUDA versions using an tune cache for QUDA compiled with a different CUDA version might not result in optimal results.

It might be good to add the CUDA / nvcc version to the QUDA_HASH.

— Reply to this email directly or view it on GitHubhttps://github.com/lattice/quda/issues/140.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

mathiaswagner commented 10 years ago

Yes, I will try to get that implemented. Any preference to try to get that added via QUDA_HASH and some nvcc -V + grep or using __CUDA_API_VERSION ?

rbabich commented 10 years ago

There might be a better way, but at one point I had this in make.inc.example (predated make.inc.in):

CUDA_VERSION = $(shell awk '/\#define CUDA_VERSION/{print $$3}' $(CUDA_INSTALL_PATH)/include/cuda.h)

mathiaswagner commented 10 years ago

Sounds like a good solution.

While looking at the 'tunecache.csv' I also noticed that the GPU which was used for tuning is not stored. I am not sure whether I get this right, but if I tune on a K20 these values will also be used on a K20X or K40 although they have a different number of SM(X)? Although I never tried building QUDA for multiple architectures (i.e. sm_20 and sm35): What happens to the tunecache in that case?

mathiaswagner commented 10 years ago

Added CUDA_VERSION in commit 6f2b22c42f726b9045351c0427f99d74959e2eb0 Currently QUDA only warns: 'WARNING: Cache file /tmp/mwquda/tunecache.tsv does not match current QUDA build' Maybe it would be better to retune by default?

maddyscientist commented 10 years ago

I think retuning by default is good since it guaranteed to work, and a warning could be missed leading to erroneous bugs being filed.

rbabich commented 10 years ago

You're probably right.

I think the original motivation for defaulting to a warning was to allow maximum control (e.g., letting the user choose whether to use a mismatched tunecache or manually blow it away). I doubt anyone has ever wanted to do that ever, though, so let's make it more idiot-proof.

rbabich commented 10 years ago

On second thought, I'd vote to turn the warning into an error and not overwrite an existing tunecache.

maddyscientist commented 10 years ago

Fine by me.

Sent from my iPhone

On Jun 11, 2014, at 22:42, "Ron Babich" notifications@github.com<mailto:notifications@github.com> wrote:

On second thought, I'd vote to turn the warning into an error and not overwrite an existing tunecache.

— Reply to this email directly or view it on GitHubhttps://github.com/lattice/quda/issues/140#issuecomment-45832111.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

mathiaswagner commented 10 years ago

What shall happen if we turn the warning in an error? Shall the code just continue to run (that is essentially the same as the warning) or shall it end? If the code just ends the user has to manually delete or move to old tunecache and I am not sure whether this is better (in terms of confusion of users). We could also just move the old tunecache to a backup like tunecache.n.tsx or maybe even better add some more information to the tunecache name? I will change it to a warning for now but I am really not sure whether this is what we want for the users.

rbabich commented 10 years ago

As a user, I'd much rather get a fatal error than have my tunecache clobbered, even if a backup is made.

The typical case where a user might want to run on multiple architectures is on a cluster, e.g., at JLab, where both Fermis and Keplers are installed. The standard way to do this is to have a separate run script for each queue/architecture that specifies the correct executable and QUDA_RESOURCE_PATH. If I screw up and forget to change the QUDA_RESOURCE_PATH in my new Kepler script that I copied-and-pasted from Fermi, I definitely do not want my Kepler jobs to all start trying to clobber my Fermi tunecache, which might be in use by already-submitted Fermi jobs.

Even in the case where re-tuning is required because I've recompiled with a new CUDA version, I'd rather get an error, but that one's probably more debatable.

I don't think an error will cause confusion if we give a good error message, e.g.,

ERROR: Cache file /tmp/mwquda/tunecache.tsv does not match current QUDA build.
Please delete this file or set the QUDA_RESOURCE_PATH environment variable to point to a new path.

mathiaswagner commented 10 years ago

Agreed.

Although we might also add just as a convenience feature for the user the sm to the tunecache filename, like tunecache_sm35.tsv. Probably would not hurt anyone and does also work if different QUDA resource paths are used.

I have no idea how much it actually affects the performance but right now QUDA just cares about the architecture of the GPU and not the GPU itself. A tunecache.tsv generated with a K40 will be used also for K20. I generated a fresh tunecache on a K40 and a K20 they contain different launch configurations for almost every Kernel. But that is probably negligible.

maddyscientist commented 10 years ago

For large problems, the effect is likely negligible. It'll be more of an effect for strong scaled problems. Regardless, the number of clusters that have a heterogeneous mix of GPUs is probably small.

lattice / quda

Tunecache does not save CUDA version #140

reply email and destroy all copies of the original message.

reply email and destroy all copies of the original message.