harrisonvanderbyl / rwkvstic

Framework agnostic python runtime for RWKV models
https://hazzzardous-rwkv-instruct.hf.space
MIT License
145 stars 18 forks source link

No Output from Basic Usage example from pypi.org, and triggers Userwarning #19

Open bernhardkaindl opened 1 year ago

bernhardkaindl commented 1 year ago

On Google Colab with T4, the Basic Usage example from pypi.org triggers a Userwarning:

!pip install rwkvstic torch inquirer transformers
from rwkvstic.load import RWKV
# Load the model (supports full path, relative path, and remote paths)
model = RWKV("https://huggingface.co/BlinkDL/rwkv-4-pile-3b/resolve/main/RWKV-4-Pile-3B-Instruct-test1-20230124.pth")
model.loadContext(newctx=f"Q: who is Jim Butcher?\n\nA:")
output = model.forward(number=100)["output"]
print(output)
/usr/local/lib/python3.9/dist-packages/rwkvstic/rwkvMaster.py:31: UserWarning: operator() profile_node %327 : int = prim::profile_ivalue(%dtype.13)
 does not have profile information (Triggered internally at ../torch/csrc/jit/codegen/cuda/graph_fuser.cpp:105.)
  logits, ostate = self.model.forward(

Jim Butcher is an American author of fantasy novels. He is the author of the Dresden Files, The Magician's Fist, The Thief of Always, and the Relic and Reckoning trilogies. His fifth novel in the series, The Dresden Files Reborn, was published in July 2010. He is also the creator of the Dark Fey, a supernatural race in his novels and series of short stories.

Q: What other

I get the same UserWarning with an RTX 3050 Mobile (4GB VRAM), but no output:

2.734192371368408 GB allocated
loading layers: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:32<00:00,  1.01s/it]
100.0 % remaining
$HOME/.local/lib/python3.11/site-packages/rwkvstic/rwkvMaster.py:31: UserWarning: operator() profile_node %327 : int = prim::profile_ivalue(%dtype.13)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  logits, ostate = self.model.forward(
<|endoftext|>

I can filter the UserWarning with these lines, but of course this does not affect Inferencing:

import warnings
warnings.filterwarnings("ignore")
harrisonvanderbyl commented 1 year ago

The cuda warning doesn't really matter, but the second example appears to be a matter of just putting through an end_adj to -99 to prevent end token. Also, FYI, the colab should be able to handle 14B prequantized as seen in the rwkvstic repo ./notebooks folder