pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
1.29k stars 115 forks source link

Update requirements.txt #323

Closed qiziAI closed 1 month ago

qiziAI commented 1 month ago

conflict with the install.

facebook-github-bot commented 1 month ago

Hi @qiziAI!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

awgu commented 1 month ago

@qiziAI Thanks for the PR! Could you provide some more details of the conflict for our understanding?

qiziAI commented 1 month ago

@qiziAI Thanks for the PR! Could you provide some more details of the conflict for our understanding?

Yes. If I installed torch 2.3.0 by using pip, then run pip install -r requirements.txt. The torch version is still 2.3.0 not 2.2.0 dev or any other dev version that >= 2.2.0.

Requirement already satisfied: torch>=2.2.0.dev in ./venv2/lib/python3.10/site-packages (from -r requirements.txt (line 1)) (2.3.0)

This causes error occurs when execute run_llama_train.sh script :

ImportError: cannot import name '_copy_state_dict' from 'torch.distributed._state_dict_utils'

Hope this details can help.

LucasLLC commented 1 month ago

Shouldn't this be updated to torch==2.3.1 or the correct version which includes _copy_state_dict ?

qiziAI commented 1 month ago

Shouldn't this be updated to torch==2.3.1 or the correct version which includes _copy_state_dict ?

Indeed, the purpose of this PR is to point out that the line 'torch >= 2.2.0 dev' in requirements.txt does not guarantee the installation of a torch version containing the '_copy_state_dict' function as you mentioned. That's the issue.

tianyu-l commented 1 month ago

@qiziAI Thanks for pointing this out! Since the newly added import "_copy_state_dict" is not used by default, we don't necessarily need to require the most recent pytorch. This is fixed in #333.

qiziAI commented 1 month ago

@tianyu-l Great! Thanks!