worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
https://worldbank.github.io/REaLTabFormer/
MIT License
212 stars 24 forks source link

Running Realtab on Macs #43

Closed vinay-k12 closed 1 year ago

vinay-k12 commented 1 year ago

Did anyone tried running this on Mac M2 Ultra or Max?

I tried it on M1 Pro. It chokes the memory. How to check if the training is using the GPU cores? https://wandb.ai/capecape/pytorch-M1Pro/reports/PyTorch-Runs-On-the-GPU-of-Apple-M1-Macs-Now-Announcement-With-Code-Samples---VmlldzoyMDMyNzMz?galleryTag=ml-news

This article mentioned about running pytorch on M1. I tried the same settings on M1 pro but the memory was pretty much choked and not sure if the training is using GPUs or not.

vinay-k12 commented 1 year ago

I was able to run training by installing conda install pytorch::pytorch torchvision torchaudio -c pytorch

But facing this issues.

image image
avsolatorio commented 1 year ago

@vinay-k12 I have recently encountered this (in a separate project) and found this issue on PyTorch: https://github.com/pytorch/pytorch/issues/96610

A workaround is to install the nightly version with the fix. pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Unfortunately, I see no performance improvement, but let me know if it works for you!

vinay-k12 commented 1 year ago

I had to update my OS to 13.5 and install the nightly version. It worked. Mine is M1 Pro with 16 GB ram. Training is slower than T4. There is a memory advantage, but it is very slow.

sfc-gh-mholboke commented 1 year ago

I've been trying to run this on a M2 mac with OS 13.5 and used the nightly pytorch but get the following error: . . . File ~/anaconda3/envs/example/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py:346, in (.0) 343 def _non_default_cpu_generators(): 344 return { 345 o for o in gc.get_objects() --> 346 if isinstance(o, torch.Generator) and 347 o is not torch.random.default_generator and 348 # We can't handle CUDA generators as the CUDA context may not be initialized. 349 o.device.type == "cpu" 350 }

ReferenceError: weakly-referenced object no longer exists

Any ideas? Actually think this is related to https://github.com/pytorch/pytorch/issues/107531