To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Installed torch with pip install torch. Everything went fine:
(~) python
Python 3.12.6 (main, Sep 9 2024, 00:00:00) [GCC 14.2.1 20240801 (Red Hat 14.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
Then, while attempting to install minference with pip install minference, got the following error:
Traceback (most recent call last):
File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-dultiu8y/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 332, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-dultiu8y/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 302, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-dultiu8y/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 503, in run_setup
super().run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-dultiu8y/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 318, in run_setup
exec(code, locals())
File "<string>", line 11, in <module>
ModuleNotFoundError: No module named 'torch'
[end of output]
Steps to reproduce
Fresh Fedora 40, fresh CUDA 12.6 Update 1 with cuda-gcc, fresh PyTorch 2.4.1.
Describe the bug
Installed torch with
pip install torch
. Everything went fine:Then, while attempting to install minference with
pip install minference
, got the following error:Steps to reproduce
Fresh Fedora 40, fresh CUDA 12.6 Update 1 with cuda-gcc, fresh PyTorch 2.4.1.
Expected Behavior
Pip install should work.
Logs
No response
Additional Information
No response