Quantize non-SSM components of the architecture

stevenabreu7 commented 5 months ago

In the S5fork, we're currently only quantising the SSM. However, there are many MLPs and batch/layer norms that should also be quantized. Perhaps there could be a general flag like non_ssm_bits: int? that specifies the precision to use for these.

If we want to claim efficient inference on HW, we should have integer precision everywhere!

kmheckel commented 5 months ago

[x] Add non_ssm_precision dot_general to QuantizationConfig
[x] Create a qseq_model.py and qlayers.py file for new quantized versions of models
[x] Modify qtrain.py to import the qseq_models from this new file
[x] Have the qseq constructor functions also take the qconfig datastruct
[x] Quantize the encoder in StackedEncoderModule https://github.com/stevenabreu7/S5/blob/c4a22d830568ada26b30ff2be643d8b69ca04002/s5/seq_model.py#L39
[x] Quantize the decoder in ClassificationModel https://github.com/stevenabreu7/S5/blob/c4a22d830568ada26b30ff2be643d8b69ca04002/s5/seq_model.py#L143
[x] Quantize the decoder for Retrieval Decoder https://github.com/stevenabreu7/S5/blob/c4a22d830568ada26b30ff2be643d8b69ca04002/s5/seq_model.py#L206
[x] Quantize SequenceLayer dense layers https://github.com/stevenabreu7/S5/blob/c4a22d830568ada26b30ff2be643d8b69ca04002/s5/layers.py#L38
[x] possibly quantize QSequenceLayer batch/layer norms
[x] quantize masked_meanpool https://github.com/stevenabreu7/S5/blob/c4a22d830568ada26b30ff2be643d8b69ca04002/s5/seq_model.py#L70

kmheckel commented 5 months ago

Question is, would it be better to pass the QuantizedOperations object around for all of the other layers to extract the necessary dot_general operation or would it be better to pass the config and then build the dot_general locally? I think the first one is maybe better but I have code written to do the latter. Passing the config might be nice since it could reduce bloat if we need to create more case-specific quantized operations for things such as Layernorm. Just pushed what I've implemented so far, haven't tested it yet.

kmheckel commented 5 months ago

Trying to run, getting this error. Will investigate.


wandb: Tracking run with wandb version 0.16.6
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
[*] Setting Randomness...
[*] Generating MNIST Classification Dataset
[*] Starting S5 Training on `mnist-classification` =>> Initializing...
Lambda.shape=(128,)
V.shape=(256, 128)
Vinv.shape=(128, 256)
jax.errors.SimplifiedTraceback: For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/legion/Code/ngsm/S5/run_qtrain.py", line 223, in <module>
    train(parser.parse_args())
  File "/home/legion/Code/ngsm/S5/s5/qtrain.py", line 180, in train
    state = create_train_state(
  File "/home/legion/Code/ngsm/S5/s5/train_helpers.py", line 131, in create_train_state
    variables = model.init({"params": init_rng,
  File "/home/legion/Code/ngsm/S5/s5/qseq_model.py", line 166, in __call__
    x = self.encoder(x, integration_timesteps)
  File "/home/legion/Code/ngsm/S5/s5/qseq_model.py", line 71, in __call__
    x = layer(x)
  File "/home/legion/Code/ngsm/S5/s5/qlayers.py", line 79, in __call__
    x = x * jax.nn.sigmoid(self.out2(x))
  File "/home/legion/.local/lib/python3.10/site-packages/flax/linen/linear.py", line 274, in __call__
    y = dot_general(
TypeError: quant_dot_for_dot.<locals>._dot() got an unexpected keyword argument 'precision'
jax.errors.SimplifiedTraceback: For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/legion/Code/ngsm/S5/run_qtrain.py", line 223, in <module>
    train(parser.parse_args())
  File "/home/legion/Code/ngsm/S5/s5/qtrain.py", line 180, in train
    state = create_train_state(
  File "/home/legion/Code/ngsm/S5/s5/train_helpers.py", line 131, in create_train_state
    variables = model.init({"params": init_rng,
  File "/home/legion/Code/ngsm/S5/s5/qseq_model.py", line 166, in __call__
    x = self.encoder(x, integration_timesteps)
  File "/home/legion/Code/ngsm/S5/s5/qseq_model.py", line 71, in __call__
    x = layer(x)
  File "/home/legion/Code/ngsm/S5/s5/qlayers.py", line 79, in __call__
    x = x * jax.nn.sigmoid(self.out2(x))
  File "/home/legion/.local/lib/python3.10/site-packages/flax/linen/linear.py", line 274, in __call__
    y = dot_general(
TypeError: quant_dot_for_dot.<locals>._dot() got an unexpected keyword argument 'precision'```