Open yhna940 opened 1 week ago
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
I was afraid that something like this would happen: Since CI did not correctly install the old PyTorch version, there seems to be some code that indeed fails with the CI because it requires a more recent version. So for this fix to pass, we would also have to fix all the failing tests (which may or may not be easy).
However, I wonder if we really want to support PyTorch 1.10.0, which is from Oct 2021. When I talked to PyTorch devs, their opinion was that supporting the last 4 versions (not counting patch releases) is reasonable, so that would mean setting the min version to 2.0.1. Probably we want to be a bit more generous here but the last change to the min version was one year ago, so I think we can bump it. Depending on the chosen version, this means way may not have to fix any tests after all.
Let's wait for Zach's return to decide this.
Thank you for the review!
I understand the concern regarding the PyTorch version support. Since Zach's input is pending, I'm happy to wait for his review.
However, I would like to emphasize that this PR specifically addresses the issue of the pytorch-version
not being correctly referenced in the CI configuration. It ensures that the specified minimum version is actually installed.
Given that this PR is focused on fixing the reference issue, I suggest merging it with the installation set to torch==2.X
(the minimum version that allows the CI to pass). This way, we can ensure the CI pipeline is correctly referencing the pytorch-version
.
Subsequently, huggingface maintainers can decide on the appropriate minimum PyTorch version policy internally. They can also address any failing tests and gradually adjust the CI minimum version as needed. Modifying the test cases within this PR would make it significantly larger and more complex, so I propose handling those changes separately.
Looking forward to your thoughts and Zach's input.
if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==2.x; fi
Given that this PR is focused on fixing the reference issue, I suggest merging it with the installation set to
torch==2.X
(the minimum version that allows the CI to pass). This way, we can ensure the CI pipeline is correctly referencing thepytorch-version
.
This could be a good compromise and would be a strict improvement on the current status. Do you know what that version would be? If not, we could just try 2.0. Regardless though, this PR would need to wait for Zach's approval to be merged, so a bit of waiting is inevitable.
Modifying the test cases within this PR would make it significantly larger and more complex, so I propose handling those changes separately.
I agree.
Do you know what that version would be? If not, we could just try 2.0. Regardless though, this PR would need to wait for Zach's approval to be merged, so a bit of waiting is inevitable.
It seems that CI only works on the latest version of torch. I've tried v2.2.2 and v2.1.2 with the make test
command and got the following error messages.
FAILED tests/test_grad_sync.py::SyncScheduler::test_gradient_sync_gpu_multi - RuntimeError: 'accelerate launch --num_processes=8 --monitor_interval=0.1 /workspaces/accelerate/src/accelerate/test_utils/scripts/test_sync.py' failed with returncode 1
FAILED tests/test_multigpu.py::MultiDeviceTester::test_multi_device_merge_fsdp_weights - RuntimeError: 'accelerate launch --num_processes=8 --monitor_interval=0.1 /workspaces/accelerate/src/accelerate/test_utils/scripts/test_merge_weights.py' failed with returncode 1
FAILED tests/test_multigpu.py::MultiDeviceTester::test_pippy - RuntimeError: 'accelerate launch --multi_gpu --num_processes=8 /workspaces/accelerate/src/accelerate/test_utils/scripts/external_deps/test_pippy.py' failed with returncode 1
FAILED tests/test_utils.py::UtilsTester::test_dynamo - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
Thanks for your patience while I was out on holiday! Our general rule of thumb with Accelerate to keep things stable is the last 2 years of PyTorch releases, which for us right now is 1.12.0.
We should definitely fix whatever is broken, thank you for pointing out that minimum
isn't doing it's job!
If you feel up to working on fixing all these issues, or tackling a few of them, that'd be most welcome. The other solution is to merge this, since we know this "fixes" the CI, and I can get to getting these passing ASAP.
Which are you more open to contribute to? π€
Thank you for getting back to me, and I hope you had a wonderful holiday!
For this PR, I suggest merging it as it corrects the issue with the pytorch-version
reference in the CI workflow. Once this is addressed, we can move forward with fixing any broken CI issues separately.
I'm open to contributing further by addressing specific failing tests or issues that come up.
Looking forward to your thoughts on this plan :)
For this PR, I suggest merging it as it corrects the issue with the
pytorch-version
reference in the CI workflow. Once this is addressed, we can move forward with fixing any broken CI issues separately.
We can't really merge the PR as is, since it would mean that CI is broken until all the issues with older PyTorch versions have been fixed. Instead, we should make this change, right?
- if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==1.10.0; fi
+ if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==2.3.1; fi
This way, all the tests pass and we're basically testing the same thing as we already do in accelerate. When PyTorch 2.4 is released, we also make sure that 2.3 keeps on working. Next, we can slowly try to move the version down to 1.12 while fixing the issues that come up.
What does this PR do?
This PR corrects the CI workflow to ensure that the minimum specified PyTorch version (1.10.0) is installed when the
matrix.pytorch-version
is set tominimum
. Previously, the installation condition incorrectly referencedmatrix.test-kind
, leading to the installation of the latest PyTorch version instead.Changes Made
matrix.test-kind
tomatrix.pytorch-version
.Before
After
Motivation and Context
This change ensures that the CI workflow correctly installs PyTorch 1.10.0 when the
matrix.pytorch-version
is set tominimum
. This avoids any inconsistencies and ensures that tests are run against the intended version of PyTorch.Related Issues
N/A
Before submitting