Open bernhardkaindl opened 1 year ago
The cuda warning doesn't really matter, but the second example appears to be a matter of just putting through an end_adj to -99 to prevent end token. Also, FYI, the colab should be able to handle 14B prequantized as seen in the rwkvstic repo ./notebooks folder
On Google Colab with T4, the Basic Usage example from pypi.org triggers a Userwarning:
I get the same UserWarning with an RTX 3050 Mobile (4GB VRAM), but no output:
I can filter the UserWarning with these lines, but of course this does not affect Inferencing: