Incomplete Operation Support for ```torchdet``` Test Tool - Githubissues

minnervva / torchdetscan

This is a tool for finding non-deterministic functions in your pytorch code.

https://github.com/minnervva/torchdetscan

MIT License

1 stars 1 forks source link

Incomplete Operation Support for ```torchdet``` Test Tool #61

Open sanjif-shanmugavelu opened 1 month ago

sanjif-shanmugavelu commented 1 month ago

List of Non-Deterministic Operations in PyTorch

The following operations in PyTorch exhibit non-deterministic behavior according to the torch.use_deterministic_algorithms documentation. We should ensure the testing tool supports runtime tests on the operations below. Note the list is scraped from the PyTorch 2.4 Stable Release version, and we ideally want to support all ops from 1.6.0 to 2.3, to ensure compatibility with the scanner

TODO: Add Tests for Non-Deterministic Operations

[x] torch.nn.Conv1d when called on a CUDA tensor
[x] torch.nn.Conv2d when called on a CUDA tensor
[x] torch.nn.Conv3d when called on a CUDA tensor
[x] torch.nn.ConvTranspose1d when called on a CUDA tensor
[x] torch.nn.ConvTranspose2d when called on a CUDA tensor
[x] torch.nn.ConvTranspose3d when called on a CUDA tensor
[x] torch.nn.ReplicationPad2d when attempting to differentiate a CUDA tensor
[x] torch.bmm() when called on sparse-dense CUDA tensors (Mathieu, done)
[ ] torch.Tensor.__getitem__() when attempting to differentiate a CPU tensor and the index is a list of tensors
[x] torch.Tensor.index_put() with accumulate=False
[x] torch.Tensor.index_put() with accumulate=True when called on a CPU tensor
[x] torch.Tensor.put_() with accumulate=True when called on a CPU tensor
[x] torch.Tensor.scatter_add_() when called on a CUDA tensor
[x] torch.gather() when called on a CUDA tensor that requires grad
[x] torch.index_add() when called on a CUDA tensor
[x] torch.index_select() when attempting to differentiate a CUDA tensor
[ ] torch.repeat_interleave() when attempting to differentiate a CUDA tensor
[x] torch.Tensor.index_copy() when called on a CPU or CUDA tensor
[x] torch.Tensor.scatter() when src type is Tensor and called on a CUDA tensor
[x] torch.Tensor.scatter_reduce() when reduce='sum' or reduce='mean' and called on a CUDA tensor
[x] torch.nn.AvgPool3d when attempting to differentiate a CUDA tensor @chrisculver
[ ] torch.nn.AdaptiveAvgPool2d when attempting to differentiate a CUDA tensor @chrisculver
[ ] torch.nn.AdaptiveAvgPool3d when attempting to differentiate a CUDA tensor @chrisculver
[x] torch.nn.MaxPool3d when attempting to differentiate a CUDA tensor @chrisculver
[ ] torch.nn.AdaptiveMaxPool2d when attempting to differentiate a CUDA tensor @chrisculver
[ ] torch.nn.FractionalMaxPool2d when attempting to differentiate a CUDA tensor @chrisculver
[ ] torch.nn.FractionalMaxPool3d when attempting to differentiate a CUDA tensor @chrisculver
[x] torch.nn.MaxUnpool1d @chrisculver
[x] torch.nn.MaxUnpool2d @chrisculver
[x] torch.nn.MaxUnpool3d @chrisculver
[ ] torch.nn.functional.interpolate() when attempting to differentiate a CUDA tensor and one of the following modes is used:
- [ ] linear
- [ ] bilinear
- [ ] bicubic
- [ ] trilinear
[x] torch.nn.ReflectionPad1d when attempting to differentiate a CUDA tensor
[x] torch.nn.ReflectionPad2d when attempting to differentiate a CUDA tensor
[x] torch.nn.ReflectionPad3d when attempting to differentiate a CUDA tensor
[x] torch.nn.ReplicationPad1d when attempting to differentiate a CUDA tensor
[x] torch.nn.ReplicationPad3d when attempting to differentiate a CUDA tensor
[x] torch.nn.NLLLoss when called on a CUDA tensor
[x] torch.nn.CTCLoss when attempting to differentiate a CUDA tensor
[ ] torch.nn.EmbeddingBag when attempting to differentiate a CUDA tensor when mode='max' @sanjif-shanmugavelu
[x] torch.Tensor.put_() when accumulate=False (@mtaillefumier )
[x] torch.Tensor.put_() when accumulate=True and called on a CUDA tensor (@mtaillefumier )
[x] torch.histc() when called on a CUDA tensor (@mtaillefumier)
[x] torch.bincount() when called on a CUDA tensor and weights tensor is given (@mtaillefumier )
[ ] torch.kthvalue() when called on a CUDA tensor @sanjif-shanmugavelu
[x] torch.median() with indices output when called on a CUDA tensor
[ ] torch.nn.functional.grid_sample() when attempting to differentiate a CUDA tensor @sanjif-shanmugavelu
[x] torch.cumsum() when called on a CUDA tensor when dtype is floating point or complex (@mtaillefumier)
[x] torch.Tensor.scatter_reduce() when reduce='prod' and called on a CUDA tensor
[ ] torch.Tensor.resize_() when called with a quantized tensor @sanjif-shanmugavelu
[ ] Add latency measurements for torch.nn backwards benchmarks @sanjif-shanmugavelu

markcoletti commented 1 month ago

I'm going to add option for specifying output file since I'm pretty confident that @sanjif-shanmugavelu and @chrisculver aren't implementing that. ;)

markcoletti commented 1 month ago

I've added code on the feature branch to suppress the following SciPy warning:

/Users/may/Projects/Ada/minnervva/torchdetscan/venv/lib/python3.11/site-packages/scipy/stats/_axis_nan_policy.py:573: RuntimeWarning: Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.

markcoletti commented 1 month ago

Added column kernel to output dataframe since there's a chance if the user specifies a filename it's not going to map to a kernel name.

markcoletti commented 1 month ago

Also added a timestamp column to capture when a benchmark was run. I've found that useful for helping me zero in on a specific dataset and to do comparisons between runs.

I've also dropped in tqdm to give some visual feedback on benchmarks since a few could take a while.

markcoletti commented 1 month ago

@sanjif-shanmugavelu , is there a benchmark you could have me work on to add? I'd start on the above list, but there's the risk I'd be duplicating your efforts.

markcoletti commented 1 month ago

Per our conversation on slack, I'll work on implementing benchmark for median.

markcoletti commented 1 month ago

I have pushed a version of support for a median benchmark to the feature branch. However, I recommend that @sanjif-shanmugavelu give it a look as it's very basic and doesn't exercise all possible hyperparameters, such as keepdim.

mtaillefumier commented 3 weeks ago

My first contribution to the list: bmm. I keep it separated for now and will open a PR with other kernels as well.

mtaillefumier commented 2 weeks ago

Just merged my contributions. git pull -r might be required