Closed akhilp24 closed 3 months ago
Unfortunately this would likely require a large change as the remora models are intended to be transferrable to Dorado and the model architecture (including the float precision) are fixed in the Dorado code. Thus I don't think we can robustly support this in the near term. We can look into using float32 in a future round of model training, but we would need to see little to no reduction in accuracy if we did make this change.
You could train a model that would only be useable within Remora (and this is also untested) by modifying the remora/models/ConvLSTM_w_ref.py
file to specify the dtype as torch.Float32 for each layer of the network. I hope this helps. I am going to close this as unplanned for the moment, but if you have specific issues training or running the model with this workaround within remora please re-open this thread.
I tried specifying the dtype as float32 for each layer of the network in that file and was met with this error:
[21:12:25.738] Gradients will be clipped (by value) at 0.00 MADs above the median of the last 1000 gradient maximums.
[21:12:26.305] Params (k) 134.08 | MACs (M) 7327.45
[21:12:26.305] Preparing training settings
Traceback (most recent call last):
File "/Users/akhilpeddikuppa/miniconda3/bin/remora", line 8, in <module>
sys.exit(run())
^^^^^
File "/Users/akhilpeddikuppa/miniconda3/lib/python3.12/site-packages/remora/main.py", line 71, in run
cmd_func(args)
File "/Users/akhilpeddikuppa/miniconda3/lib/python3.12/site-packages/remora/parsers.py", line 1008, in run_model_train
train_model(
File "/Users/akhilpeddikuppa/miniconda3/lib/python3.12/site-packages/remora/train_model.py", line 349, in train_model
model = model.to(device)
^^^^^^^^^^^^^^^^
File "/Users/akhilpeddikuppa/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1173, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/Users/akhilpeddikuppa/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 853, in _apply
self._buffers[key] = fn(buf)
^^^^^^^
File "/Users/akhilpeddikuppa/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1159, in convert
return t.to(
^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Hello,
I would like to run the "remora model train" command on my M3 Pro MacBook Pro with 18GB of RAM, utilizing the device's GPU; however, I run into the error:
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
after running the command
to specify I would like to use the GPU on my MacBook for the model's training.
This is the full output following the above command:
Inputting this same command but replacing "mps" with "cpu" works; however it is very slow.
Before this, I received an error that the ConvLSTM_w_ref.py file could not be found in the remora library, which is why I have downloaded the file from the remora GitHub repository and placed it in my main directory.
The command listed on the documentation in PyPi and GitHub did not work:
I am using Remora version: 3.2.0.
Please advise. Thank you.