vocalpy / vak

A neural network framework for researchers studying acoustic communication
https://vak.readthedocs.io
BSD 3-Clause "New" or "Revised" License
78 stars 16 forks source link

Out of memory error #741

Closed athenasyarifa closed 9 months ago

athenasyarifa commented 9 months ago

Hi @NickleDave and everyone, It's me again, sorry. I ran into a torch.cuda.OutOfMemoryError when I was running vak predict on my dataset. Please find below the error message:

2024-02-27 17:36:11,126 - vak.cli.predict - INFO - vak version: 1.0.0a3
2024-02-27 17:36:11,126 - vak.cli.predict - INFO - Logging results to /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict
2024-02-27 17:36:11,143 - vak.predict.frame_classification - INFO - loading SpectScaler from path: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/StandardizeSpect
2024-02-27 17:36:11,146 - vak.predict.frame_classification - INFO - loading labelmap from path: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/labelmap.json
2024-02-27 17:36:11,150 - vak.predict.frame_classification - INFO - loading dataset to predict from csv path: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict/predict-vak-frame-classification-dataset-generated-240227_171934/predict_prep_240227_171934.csv
2024-02-27 17:36:11,172 - vak.predict.frame_classification - INFO - will save annotations in .csv file: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict/willowtit_predict.annot.csv
2024-02-27 17:36:11,177 - vak.predict.frame_classification - INFO - Duration of a frame in dataset, in seconds: 0.00145
2024-02-27 17:36:11,253 - vak.predict.frame_classification - INFO - Shape of input to networks used for predictions: torch.Size([1, 257, 176])
2024-02-27 17:36:11,254 - vak.predict.frame_classification - INFO - instantiating model from config:/nTweetyNet
2024-02-27 17:36:11,275 - vak.predict.frame_classification - INFO - loading checkpoint for TweetyNet from path: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/TweetyNet/checkpoints/max-val-acc-checkpoint.pt
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
2024-02-27 17:36:12,253 - vak.predict.frame_classification - INFO - running predict method of TweetyNet
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'predict_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
Predicting DataLoader 0:  19%|██████████████████████████▊                                                                                                                      | 5/27 [00:00<00:03,  6.71it/s]Traceback (most recent call last):
  File "/home/rifsyy/anaconda3/envs/vak_env/bin/vak", line 8, in <module>
    sys.exit(main())
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/__main__.py", line 48, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/cli/cli.py", line 54, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/cli/cli.py", line 22, in predict
    predict(toml_path=toml_path)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/cli/predict.py", line 48, in predict
    predict_module.predict(
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/predict/predict_.py", line 141, in predict
    predict_with_frame_classification_model(
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/predict/frame_classification.py", line 239, in predict_with_frame_classification_model
    results = trainer.predict(model, pred_loader)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in predict
    return call._call_and_handle_interrupt(
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 903, in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1030, in _run_stage
    return self.predict_loop.run()
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/loops/prediction_loop.py", line 122, in run
    self._predict_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/loops/prediction_loop.py", line 250, in _predict_step
    predictions = call._call_strategy_hook(trainer, "predict_step", *step_args)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 429, in predict_step
    return self.lightning_module.predict_step(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/models/frame_classification_model.py", line 344, in predict_step
    y_pred = self.network(x)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/nets/tweetynet.py", line 175, in forward
    features = self.cnn(x)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/modules/pooling.py", line 164, in forward
    return F.max_pool2d(input, self.kernel_size, self.stride,
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/_jit_internal.py", line 499, in fn
    return if_false(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/nn/functional.py", line 796, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 286.00 MiB. GPU 0 has a total capacity of 2.00 GiB of which 82.37 MiB is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 1.30 GiB is allocated by PyTorch, and 142.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Predicting DataLoader 0:  19%|█▊        | 5/27 [00:01<00:07,  2.86it/s]      

and my predict.toml file looks like this:

[PREP]
dataset_type = "frame classification"
input_type = "spect"
data_dir = "/mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict"
output_dir = "/mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict"
audio_format = "wav"

[SPECT_PARAMS]
fft_size = 512
step_size = 64

[PREDICT]
model = "TweetyNet"
checkpoint_path = "/mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
labelmap_path = "/mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/labelmap.json"
spect_scaler_path = "/mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/StandardizeSpect"
batch_size = 1
num_workers = 2
device = "cpu"
output_dir = "/mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict"
annot_csv_filename = "willowtit_predict.annot.csv"
majority_vote = true
min_segment_dur = 0.01
dataset_path = "/mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict/predict-vak-frame-classification-dataset-generated-240227_171934"

[TweetyNet]

[PREDICT.transform_params]
window_size = 176

I saw similar issue #301 or is it a different case? I tried removing longer files from my predict dataset, but it was still not working. My predict dataset are 27 wav files from 7 to 52 seconds. Can't help thinking this is probably a beginner issue! How can I fix this? Thanks in advance for your help!

Best, Rifa

NickleDave commented 9 months ago

Hi @athenasyarifa, I'm sorry that vak is crashing here.

Thank you for the detailed error report. Based on what I see in the console output and the config file you provided, I don't see a reason to think this is a beginner issue.

My guess is that it's just the case that one of the spectrograms ends up being too big to fit on the GPU, and that causes the OOM error.

The quickest way to rule this out might be to make the files smaller and see if we still get the crash. I know you said you tried that already. Just to follow up:

DATASET_DIR = pathlib.Path(
    'tests/data_for_tests/generated/prep/predict/audio_cbin_annot_notmat/TweetyNet/032412-vak-frame-classification-dataset-generated-231010_165729/'
)
FILE_NUMBER_THAT_CAUSES_CRASH = 5

metadata = vak.datasets.frame_classification.Metadata.from_dataset_path(DATASET_DIR)
dataset_csv_path = DATASET_DIR / metadata.dataset_csv_filename
df = pd.read_csv(dataset_csv_path)
max_dur = df.loc[FILE_NUMBER_THAT_CAUSES_CRASH, 'duration']
new_df = df[df.duration < max_dur]
assert len(new_df) < len(df)  # make sure that worked
assert new_df['duration'].max() < max_dur  # make extra extra sure, because we are scientists

If you don't get the OOM error then, we can confirm that the issue is file size.

Assuming that is the problem, there's a couple things we could try:

  1. use different spectrogram parameters to make the spectrogram smaller, e.g. by setting limits on the frequencies using the freq_cutoffs option: https://vak.readthedocs.io/en/latest/reference/config.html#vak.config.spect_params.SpectParamsConfig.freq_cutoffs

so if you knew that your sound of interest is always between 500-12000 Hz you could do

[SPECT_PARAMS]
fft_size = 512
step_size = 64
freq_cutoffs = [500, 12000]
  1. make clips of the audio, e.g. in Raven or with a Python script -- I can help with that if we need to

I wish we had better numbers on the durations / memory use already to give you -- I did some quick tests to give you a ballpark:

Looks like your GPU has 2 GB according to the error from pytorch? Do you see the same thing if you run nvidia-smi in the terminal?

Again, really sorry we don't have a more general solution ready, we know this is an issue -- see for example #514. Ideally we'd estimate based on a user's hardware what we can fit on the GPU and then work with that.

In the meantime I'm happy to work with you to find a workaround.
Please let me know how what you figure out about the file size, I can answer more questions too if needed

athenasyarifa commented 9 months ago

Hi @NickleDave Thank you so much for your prompt response!

Did you happen to work through this tutorial and if so were you able to run predict on all the files in that dataset? That at least tells us you can predict on some files, and also gives us a lower bound on the largest size file you can predict on.

Yes, I tried working through the tutorial today and was able to quickly run every step successfully.

In the console output you provided, it looks like the crash happened on file 5. What we want to figure out is if we can run predict if you remove that file and all files larger than it. Here is how you might do that.

I tried to do what you suggested here and indeed I think the problem is the file size. What I did was, I copied my predict dataset along with the metadata.json and the predict_prep_240227_171934.csv in a new troubleshooting folder. Then, I run the script you gave me (I added a couple of other lines for anyone else having the same problem):

import pathlib
import vak
import pandas as pd

DATASET_DIR = pathlib.Path(
    '/mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/vak_troubleshoot/data'
)
FILE_NUMBER_THAT_CAUSES_CRASH = 5

metadata = vak.datasets.frame_classification.Metadata.from_dataset_path(DATASET_DIR)
dataset_csv_path = DATASET_DIR / metadata.dataset_csv_filename
df = pd.read_csv(dataset_csv_path)
max_dur = df.loc[FILE_NUMBER_THAT_CAUSES_CRASH, 'duration']
new_df = df[df.duration < max_dur]
assert len(new_df) < len(df)  # make sure that worked
assert new_df['duration'].max() < max_dur  # make extra extra sure, because we are scientists
new_df.to_csv(dataset_csv_path)

Then, I copy the rewritten csv file into the original predict dataset, and I rerun vak predict. I ran into another problem which looks like the following:

2024-02-28 11:53:45,427 - vak.cli.predict - INFO - vak version: 1.0.0a3
2024-02-28 11:53:45,427 - vak.cli.predict - INFO - Logging results to /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict
2024-02-28 11:53:45,444 - vak.predict.frame_classification - INFO - loading SpectScaler from path: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/StandardizeSpect
2024-02-28 11:53:45,448 - vak.predict.frame_classification - INFO - loading labelmap from path: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/labelmap.json
2024-02-28 11:53:45,466 - vak.predict.frame_classification - INFO - loading dataset to predict from csv path: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict/predict-vak-frame-classification-dataset-generated-240227_171934/predict_prep_240227_171934.csv
2024-02-28 11:53:45,494 - vak.predict.frame_classification - INFO - will save annotations in .csv file: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/predict/willowtit_predict.annot.csv
2024-02-28 11:53:45,499 - vak.predict.frame_classification - INFO - Duration of a frame in dataset, in seconds: 0.00145
2024-02-28 11:53:45,575 - vak.predict.frame_classification - INFO - Shape of input to networks used for predictions: torch.Size([1, 257, 176])
2024-02-28 11:53:45,576 - vak.predict.frame_classification - INFO - instantiating model from config:/nTweetyNet
2024-02-28 11:53:45,597 - vak.predict.frame_classification - INFO - loading checkpoint for TweetyNet from path: /mnt/c/Users/Lenovo/Documents/GitHub/willowtit-project/bioacoustic/vak_train/results_240227_155358/TweetyNet/checkpoints/max-val-acc-checkpoint.pt
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
2024-02-28 11:53:46,587 - vak.predict.frame_classification - INFO - running predict method of TweetyNet
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting DataLoader 0:  89%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                      | 24/27 [00:05<00:00,  4.46it/s]Traceback (most recent call last):
  File "/home/rifsyy/anaconda3/envs/vak_env/bin/vak", line 8, in <module>
    sys.exit(main())
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/__main__.py", line 48, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/cli/cli.py", line 54, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/cli/cli.py", line 22, in predict
    predict(toml_path=toml_path)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/cli/predict.py", line 48, in predict
    predict_module.predict(
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/predict/predict_.py", line 141, in predict
    predict_with_frame_classification_model(
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/predict/frame_classification.py", line 239, in predict_with_frame_classification_model
    results = trainer.predict(model, pred_loader)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in predict
    return call._call_and_handle_interrupt(
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 903, in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1030, in _run_stage
    return self.predict_loop.run()
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/loops/prediction_loop.py", line 119, in run
    batch, batch_idx, dataloader_idx = next(data_fetcher)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 127, in __next__
    batch = super().__next__()
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 56, in __next__
    batch = next(self.iterator)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 326, in __next__
    out = next(self._iterator)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 132, in __next__
    out = next(self.iterators[0])
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/rifsyy/anaconda3/envs/vak_env/lib/python3.10/site-packages/vak/datasets/frame_classification/frames_dataset.py", line 82, in __getitem__
    source_path = self.source_paths[idx]
IndexError: index 24 is out of bounds for axis 0 with size 24

Predicting DataLoader 0:  89%|████████▉ | 24/27 [00:05<00:00,  4.05it/s] 

I am not sure where the error came from, but I was able to work around it by rerunning vak prep after deleting the longest wav file from the predict dataset, followed by rerunning vak predict. But then, I still ran into another OOM error message, and I had to go through once more the troubleshooting step you suggested. In the end, I successfully ran vak predict with 19 wav files in my predict dataset ranging from 7 to 36 seconds.

use different spectrogram parameters to make the spectrogram smaller, e.g. by setting limits on the frequencies using the freq_cutoffs option:

I am checking the annotation output from vak predict as I am writing this now. So I have not yet tried the frequency cutoffs solution you suggested here.

make clips of the audio, e.g. in Raven or with a Python script -- I can help with that if we need to

nor this solution. This will be the next step I will do after checking how TweetyNet prediction results for my dataset looks like. I happen to have a segmented version of my full dataset (previously segmented with warbleR), which I can try as another predict dataset for vak.

Looks like your GPU has 2 GB according to the error from pytorch? Do you see the same thing if you run nvidia-smi in the terminal?

Yes, I saw the same thing when I run nvidia-smi, ashamed to say that I have low-spec personal laptop.

Again, many thanks for the help!

Best, Rifa