Closed marty1885 closed 1 month ago
Could you check if using latest pip fixes this issue, As I see you are downgrading it.
Collecting pip==20.1.1
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.0.2
Uninstalling pip-22.0.2:
Successfully uninstalled pip-22.0.2
As I see you are downgrading it.
@dmakoviichuk-tt The pip==20.1.1
is enforced by create_venv.sh https://github.com/tenstorrent/tt-metal/blob/f188d4528457dc3e8d33f8867bd02880a0aea2fc/create_venv.sh#L28-L29
Disabling installing old pip and did a clean install works! However now importing Torch leads to an error on NumPy version.
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> print(numpy.__file__)
/home/marty/Documents/tt-metal/python_env/lib/python3.10/site-packages/numpy/__init__.py
>>> print(numpy.version.full_version)
2.0.1
>>> import torch
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "<stdin>", line 1, in <module>
File "/home/marty/Documents/tt-metal/python_env/lib/python3.10/site-packages/torch/__init__.py", line 1477, in <module>
from .functional import * # noqa: F403
File "/home/marty/Documents/tt-metal/python_env/lib/python3.10/site-packages/torch/functional.py", line 9, in <module>
import torch.nn.functional as F
File "/home/marty/Documents/tt-metal/python_env/lib/python3.10/site-packages/torch/nn/__init__.py", line 1, in <module>
from .modules import * # noqa: F403
File "/home/marty/Documents/tt-metal/python_env/lib/python3.10/site-packages/torch/nn/modules/__init__.py", line 35, in <module>
from .transformer import TransformerEncoder, TransformerDecoder, \
File "/home/marty/Documents/tt-metal/python_env/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 20, in <module>
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
/home/marty/Documents/tt-metal/python_env/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
@dmakoviichuk-tt is there any information I can provide to help create a solution to this situation?
I don't know what changed but I am able to create a working env now... Closing
NVM, reopen, I replicated the issue by wiping existing cache and env.
Hi, I figured out something fun. create_venv,sh
forces install of pip==20.1.1
but I have pip==24.2
locally. 20.1.1 Fails to fuid and download PyTorch while 24.2 does download and install - the TTNN installation is broken. Importing ttnn does absolutely nothing and ttnn.__file__
is None.
To install TTNN I have to do the following crazy maneuver.
I've uploaded a video showing the bug. The file is too large to be uploaded as an attachment to GitHub. Feel free to contact me for the source footage. https://youtu.be/6Z0k0nHk5nE
Update: I can replicate the issue on Arch Linux (with much effort due to distro differences). it is not a Ubuntu only issue. More then likely something changed in pip or torch.
It still fails to install pytorch in TT-cloud ubuntu 22.04 virtual machine.
Having the same issue on my internal 22.04 docker builds, like what Marty said before, removing this line works:
pip install --force-reinstall pip==20.1.1
I think we have to bump the min pip version to around pip==22.0.2
, that's what worked for me.
Sorry for the delay and the confusion. I'll put an explanation here on why we enforce this version.
tl;dr: Has to do with editable installs.
For reasons I don't fully understand yet, pip versions lower than 22.0
seem to not do editable installs the way we think of them in previous pip versions, causing import errors in a development environment. What I think is specifically the problem is the .egg-link
file is not created in the virtual environment's packages for metal-libs
for higher pip versions.
This is important in development because our developers depend on editable installs to be working. This is because they want to be able to make changes in the Python code (irrelevant for C++) and see the results immediately. pip install -e .
is how you do this, which is called editable mode.
I think the relevant PEP that will help here: https://peps.python.org/pep-0660/. I'll be investigating this more with the team.
We made this change as part of this PR (bumped up to 21 later after testing more pip versions): https://github.com/tenstorrent/tt-metal/pull/10751
As an unblocker for now while we figure this out, I would recommend, when invoking create_venv.sh
:
pip
build_metal.sh
+ invoke create_venv.sh
PYTHONPATH
to <repo-dir>:<repo-dir>/ttnn
Then "editable" install should work. I will lower this to P2 for now.
Let us know if you any further questions.
If you guys have any suggestions, please feel free to offer them. Even if these problems didn't happen for you guys, I don't like that we have to pin the pip version. I would like to solve this, as well.
One of our engineers may have found the commit that changed editable installs. We are going to be looking at this as part of our Python upgrade: https://github.com/tenstorrent/tt-metal/pull/10841#issuecomment-2345016782
Using pip==21.2.4
works on my 22.04 VM, and also in 20.04 CI!
(python_env) ubuntu@tt-metal-billteng-2204-n300:~/tt-metal$ python3
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import ttnn
2024-09-14 01:58:20.883 | DEBUG | ttnn:<module>:82 - Initial ttnn.CONFIG:
Config{cache_path=/home/ubuntu/.cache/ttnn,model_cache_path=/home/ubuntu/.cache/ttnn/models,tmp_dir=/tmp/ttnn,enable_model_cache=false,enable_fast_runtime_mode=true,throw_exception_on_fallback=false,enable_logging=false,enable_graph_report=false,enable_detailed_buffer_report=false,enable_detailed_tensor_report=false,enable_comparison_mode=false,comparison_mode_pcc=0.9999,root_report_path=generated/ttnn/reports,report_name=std::nullopt,std::nullopt}
2024-09-14 01:58:22.018 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.pearson_correlation_coefficient be migrated to C++?
2024-09-14 01:58:22.020 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.Conv1d be migrated to C++?
2024-09-14 01:58:22.028 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.conv2d be migrated to C++?
2024-09-14 01:58:22.032 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.reshape be migrated to C++?
2024-09-14 01:58:22.032 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.unsqueeze_to_4D be migrated to C++?
2024-09-14 01:58:22.032 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.squeeze be migrated to C++?
2024-09-14 01:58:22.032 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.from_torch be migrated to C++?
2024-09-14 01:58:22.033 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.to_torch be migrated to C++?
2024-09-14 01:58:22.033 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.to_device be migrated to C++?
2024-09-14 01:58:22.033 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.from_device be migrated to C++?
2024-09-14 01:58:22.033 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.allocate_tensor_on_device be migrated to C++?
2024-09-14 01:58:22.033 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.copy_host_to_device_tensor be migrated to C++?
2024-09-14 01:58:22.033 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.deallocate be migrated to C++?
2024-09-14 01:58:22.033 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.reallocate be migrated to C++?
2024-09-14 01:58:22.033 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.load_tensor be migrated to C++?
2024-09-14 01:58:22.034 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.dump_tensor be migrated to C++?
2024-09-14 01:58:22.034 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.as_tensor be migrated to C++?
2024-09-14 01:58:22.036 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.avg_pool2d be migrated to C++?
2024-09-14 01:58:22.039 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.conv2d be migrated to C++?
2024-09-14 01:58:22.039 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.avg_pool2d be migrated to C++?
2024-09-14 01:58:22.040 | WARNING | ttnn.decorators:operation_decorator:768 - Should ttnn.Conv1d be migrated to C++?
>>> import torch
>>>
Describe the bug
Running
./create_venv,sh
now fails to install PyTorch.To Reproduce Steps to reproduce the behavior:
create_venv.sh
Expected behavior Successful at creating the enviroment
Screenshots If applicable, add screenshots to help explain your problem.
Please complete the following environment information:
Additional context Add any other context about the problem here.