intel / ideep

Intel® Optimization for Chainer*, a Chainer module providing numpy like API and DNN acceleration using MKL-DNN.
MIT License
162 stars 86 forks source link

ideep submodule update required for PyTorch 2.5 release #334

Closed snadampal closed 3 days ago

snadampal commented 2 weeks ago

Hi @yanbing-j , could you please update the ideep submodule commit for PyTorch 2.5 (branch cut is tomorrow). currently ideep is at: https://github.com/intel/ideep/commit/383e9238c1c118c1ff72662f9ed7c8610150325d would like it to be at: https://github.com/intel/ideep/commit/8e21a722a534f39b17228194fe25c3f57b70c7bf or latest

context for the request is: Arm team has upstreamed the dynamic quantization changes, all the PRs were merged (torch, ideep, oneDNN), but without this ideep submodule update, the feature will not work. Since the change is isolated to only matmul operator and quantization path alone, I hope the testing effort is minimal.

yanbing-j commented 2 weeks ago

Hi @snadampal ,

In the current Pytorch 2.5 release window, we have already merged oneDNN 3.5.3 upgrade. And the commits of https://github.com/intel/ideep/commit/8e21a722a534f39b17228194fe25c3f57b70c7bf and https://github.com/intel/ideep/commit/41d636c2bbcea6bff0faf97cdb65a48cdde987af are after the previous upgrade. After https://github.com/pytorch/pytorch/pull/131620 is merged, actually we are waiting for ARM test reults from your side. Just because the upgrade passes all the public CIs and Meta internal CIs, we are not that strict with you guys about the test results.

And the test requirement is same for these commits https://github.com/intel/ideep/commit/8e21a722a534f39b17228194fe25c3f57b70c7bf and https://github.com/intel/ideep/commit/41d636c2bbcea6bff0faf97cdb65a48cdde987af. They are already merged in ideep_pytorch branch, and single test PR can be submitted with these two commits in PyTorch (The test PR can be submitted by your side, or you can request us to do so, but not in such a tight schedule). And all the public CIs and ARM tests in your side need to pass.

This time, the time scheduler is tight, I can only submit the PR https://github.com/pytorch/pytorch/pull/134897, and run public CIs quickly. I'm not sure whether there is any regression or failure, and not sure if it can catch up 2.5 branch cut. And can you confirm that all the ARM tests with the PR pass? Thanks!

cc @Guobing-Chen

snadampal commented 2 weeks ago

Hi @yanbing-j , thanks for the details and also appreciate the effort for raising this PR! I will add the ARM tests status on the PR.

snadampal commented 2 weeks ago

Looks like the deadline for PR merging is Sep 6th, so, we have sometime to finish this testing and get it merged.

M2: All PRs landed in PyTorch repo / Feature Submission Closed (9/6/24) - DEADLINE

fadara01 commented 1 week ago

Our acceptance tests currently do not include dynamic quantization. Given that this change only affects the dynamic quantization path and does not change the version of oneDNN, running the full acceptance tests is not required.

I manually verified this PR and can confirm that (as expected) oneDNN calls Arm Compute Library's optimized lowp gemm kernels and on 16 Neoverse-V1 cores, the speedup for bert-large is as follows:

context length bert-large speedup with this PR
8 20.5x
16 26.9x
32 31.1x
64 40.7x
128 53.1x
256 50.0x
512 27.2x

I think the manual tests above and the CI tests are enough for us to to merge this PR

cc: @snadampal @milpuz01 @yanbing-j

snadampal commented 3 days ago

Hi @yanbing-j , appreciate your effort to accommodate this request! The ideep version looks good now, I'm closing this.