kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
https://discord.gg/qUtxnK2NMf
MIT License
1.55k stars 143 forks source link

[BUG] bitlinear fix #42

Closed jayUyang closed 4 months ago

jayUyang commented 6 months ago

beta and gamma sizes to be (1, weight.shape[0], not (weight.shape[0], 1) ???

Upvote & Fund

Fund with Polar

kyegomez commented 6 months ago

Can you elaborate please? Can you go deeper?

Vipiao commented 6 months ago

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x self.gamma self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x self.gamma self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

zouyingcao commented 6 months ago

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x self.gamma self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x self.gamma self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

I think so, but I am confused that since self.gamma is related to activations while self.beta is related to weights, should we explicitly broadcast these two matrices [quantization about activations ('group_size = x.shape[0] // self.num_groups') should be grouped in the dim=1(x.shape[1]) because of the batch_size?], thus 'x self.gamma self.beta' in the dequantization process can do hadamard product? If I make wrong, pls point out. Thanks.

zouyingcao commented 6 months ago

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x self.gamma self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x self.gamma self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

I think so, but I am confused that since self.gamma is related to activations while self.beta is related to weights, should we explicitly broadcast these two matrices [quantization about activations ('group_size = x.shape[0] // self.num_groups') should be grouped in the dim=1(x.shape[1]) because of the batch_size?], thus 'x self.gamma self.beta' in the dequantization process can do hadamard product? If I make wrong, pls point out. Thanks.

emmm, I see the owner update the new code. (without group quantization)

github-actions[bot] commented 4 months ago

Stale issue message