huggingface optimum-nvidia issues

huggingface / optimum-nvidia

Apache License 2.0

844 stars 83 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Bump the dev version

#94 mfuntowicz closed 4 months ago
0
batched generation

#93 Jack000 closed 4 months ago
1
Make float8 quantization back in the game.

#92 mfuntowicz closed 4 months ago
0
Original model configuration (config.json) was not found error during running inference using "Llama-2-7b-chat-hf"

#91 raorajendra opened 4 months ago
0
Incorrect tensorrt_llm config class initialization

#90 Wojx opened 4 months ago
0
Attempt to make CI overall looking 🟢

#89 mfuntowicz closed 4 months ago
0
update docker image

#88 yongjer closed 3 months ago
1
Ad ability to save local prebuilt engines

#87 mfuntowicz closed 4 months ago
0
fatal: Fetched in submodule path 'third-party/tensorrt-llm', but it did not contain 37aa4499520eea1d6c6dbc04a66d77bef00014f5. Direct fetching of that commit failed.

#86 hellostronger closed 3 months ago
1
Fix hardcoded embedding scale with value from config

#85 mfuntowicz closed 4 months ago
0
Qwen Support

#84 Yuchen-Cao opened 4 months ago
0
Make overall `optimum-nvidia` pip installable

#83 mfuntowicz closed 4 months ago
0
Bring back CI to a normal state

#82 mfuntowicz closed 4 months ago
0
Bring back tests to align with new workflow

#81 mfuntowicz closed 4 months ago
0
Fix repo code quality

#80 mfuntowicz closed 4 months ago
0
Make pipelines compatible with the new workflow

#79 mfuntowicz closed 4 months ago
0
Update license

#78 mfuntowicz closed 4 months ago
0
Fix gemma 7b

#77 mfuntowicz closed 4 months ago
0
How to install optimum-nvidia properly without building a docker image

#76 Yuchen-Cao closed 4 months ago
0
How to build this environment without docker?

#75 lemon-little opened 5 months ago
1
Refactoring of the overall structure to better align with the new TRTLLM workflow moving forward

#74 mfuntowicz closed 4 months ago
0
Build from source fails completely on WSL2 [Ubuntu-20.04]

#73 OPPEYRADY closed 5 months ago
0
How do you use the library in your scripts after pulling and running the Docker image?

#72 jddunn opened 5 months ago
1
llama.py with fp8 is broken (inference produces garbage results)

#71 urimerhav opened 5 months ago
3
Ability to build Whisper encoder/decoder TRT engine

#70 fxmarty closed 5 months ago
1
Triton Inference Server

#69 TheCodeWrangler opened 6 months ago
2
Error when Running LLAMA with tensor parallelism = 2

#68 TheCodeWrangler opened 6 months ago
1
Mixtral support

#67 nmiletic opened 6 months ago
4
Fixed Repetition Penalty default value

#66 leopra closed 4 months ago
3
Bump TRTLLM to latest version #d879430

#65 mfuntowicz closed 6 months ago
0
Build failed with cuda runtime error.

#64 Anindyadeep opened 6 months ago
1
Bug fixes in readme.

#63 Anindyadeep closed 6 months ago
1
Pip Installation

#62 rmccorm4 opened 6 months ago
4
Not able to run 'Generate' from QuickStart section

#61 harikrishnaapc opened 6 months ago
0
Feature request: streamer

#60 RomanKoshkin opened 6 months ago
0
Installation errors

#59 RomanKoshkin opened 6 months ago
0
RuntimeError: TRT Engine build failed...

#58 yirunwang opened 6 months ago
1
model.generate returns strange output shape

#57 Quang-elec44 closed 5 months ago
2
ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

#56 taozhang9527 opened 7 months ago
0
Disable RMSNorm plugin as deprecated for performance reasons

#55 mfuntowicz closed 7 months ago
0
Rename LLamaForCausalLM to LlamaForCausalLM to match transformers

#54 mfuntowicz closed 7 months ago
0
Bump version to 0.1.0b2

#53 mfuntowicz closed 7 months ago
0
Add more unittest

#52 mfuntowicz closed 7 months ago
0
Enable HF Transfer in tests

#51 mfuntowicz closed 7 months ago
0
Use FP8 by default when on a supported device

#50 laikhtewari opened 7 months ago
0
Issue with llama.py Script in Docker - Process Stalling at Iteration 512

#49 H04K closed 7 months ago
1
Let's make sure to use the repeated heads tensor when in a non-mha scenario

#48 mfuntowicz closed 7 months ago
0
FileNotFoundError: [Errno 2] No such file or directory: '/data/Dilip/models/llama-2-7b-chat-hf/build.json'

#47 dilip467 opened 7 months ago
8
Use the new runtime handled allocation

#46 mfuntowicz closed 7 months ago
0
Enable testing on GPUs

#45 mfuntowicz closed 7 months ago
0

Previous Next