karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
37.34k stars 5.94k forks source link

getting error on M1 #118

Closed vaclavt closed 1 year ago

vaclavt commented 1 year ago

NotImplementedError: Could not run 'aten::_pin_memory' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_pin_memory' is only available for these backends: [MPS, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

It seems to me, that this error was introduced in commit: "who needs a dataloader? overlap the prefetching of the next batch with GPU compute, ehiding the data loading latency entirely. this saves about 1ms lol" and in file train.py on line 114 should be branch by device_type.. Greetings from Czech Republic my hero Mr. Karpathy..

pavlokochkin commented 1 year ago

fixed by removing pin_memory() in train.py x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)

karpathy commented 1 year ago

Yep I broke it last night, fixed with the most recent commit to master, closing.