karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
34.57k stars 5.32k forks source link

"RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU #525

Open shenbb opened 2 weeks ago

shenbb commented 2 weeks ago

python3 train.py config/train_shakespeare_char.py

Overriding config with config/train_shakespeare_char.py:

train a miniature character-level shakespeare model

good for debugging and playing on macbooks and such

out_dir = 'out-shakespeare-char' eval_interval = 250 # keep frequent because we'll overfit eval_iters = 200 log_interval = 10 # don't print too too often

we expect to overfit on this small dataset, so only save when val improves

always_save_checkpoint = False

wandb_log = False # override via command line if you like wandb_project = 'shakespeare-char' wandb_run_name = 'mini-gpt'

dataset = 'shakespeare_char' gradient_accumulation_steps = 1 batch_size = 64 block_size = 256 # context of up to 256 previous characters

baby GPT model :)

n_layer = 6 n_head = 6 n_embd = 384 dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher max_iters = 5000 lr_decay_iters = 5000 # make equal to max_iters usually min_lr = 1e-4 # learning_rate / 10 usually beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

on macbook also add

device = 'cpu' # run on cpu only

compile = False # do not torch compile the model

tokens per iteration will be: 16,384 found vocab_size = 65 (inside data/shakespeare_char/meta.pkl) Initializing a new model from scratch number of parameters: 10.65M num decayed parameter tensors: 26, with 10,740,096 parameters num non-decayed parameter tensors: 13, with 4,992 parameters using fused AdamW: True compiling the model... (takes a ~minute) Traceback (most recent call last): File "train.py", line 264, in losses = estimate_loss() File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "train.py", line 224, in estimate_loss logits, loss = model(X, Y) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn return fn(*args, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors return callback(frame, cache_entry, hooks, frame_state, skip=1) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 786, in _convert_frame result = inner_convert( File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert return _compile( File "/usr/lib/python3.8/contextlib.py", line 75, in inner return func(*args, kwds) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 676, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper r = func(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner out_code = transform_code_object(code, transform) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object transformations(instructions, code_options) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 165, in _fn return fn(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 500, in transform tracer.run() File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run super().run() File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 810, in run and self.step() File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 773, in step getattr(self, inst.opname)(inst) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE self.output.compile_subgraph( File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1001, in compile_subgraph self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root) File "/usr/lib/python3.8/contextlib.py", line 75, in inner return func(*args, kwds) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1178, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper r = func(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1251, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 1232, in call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper compiled_gm = compiler_fn(gm, example_inputs) File "/usr/local/lib/python3.8/dist-packages/torch/init.py", line 1731, in call return compilefx(model, inputs_, config_patches=self.config) File "/usr/lib/python3.8/contextlib.py", line 75, in inner return func(args, kwds) File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx return aot_autograd( File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn cg = aot_module_simplified(gm, example_inputs, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified compiled_fn = create_aot_dispatcher_function( File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper r = func(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata) File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata) File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata) File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base compiled_fw = compiler(fw_module, updated_flat_args) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper r = func(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base return inner_compile( File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper inner_compiled_fn = compiler_fn(gm, example_inputs) File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/debug.py", line 304, in inner return fn(*args, kwargs) File "/usr/lib/python3.8/contextlib.py", line 75, in inner return func(*args, *kwds) File "/usr/lib/python3.8/contextlib.py", line 75, in inner return func(args, kwds) File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper r = func(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner compiled_graph = fx_codegen_and_compile( File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile compiled_fn = graph.compile_to_fn() File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn return self.compile_to_module().call File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper r = func(args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/graph.py", line 1254, in compile_to_module mod = PyCodeCache.load_by_key_path( File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2160, in load_by_key_path exec(code, mod.dict, mod.dict) File "/tmp/torchinductor_libra/6z/c6zptqfvl4uwgoca6tk4qimwczeni4sq2plv5hxtx7vncbopqccc.py", line 1162, in async_compile.wait(globals()) File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2715, in wait scope[key] = result.result() File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 2522, in result self.future.result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: RuntimeError: Internal Triton PTX codegen error: ptxas /tmp/compile-ptx-src-863569, line 636; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-863569, line 636; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-863569, line 638; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-863569, line 638; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-863569, line 640; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-863569, line 640; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-863569, line 642; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-863569, line 642; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas fatal : Ptx assembly aborted due to errors

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True

mishra011 commented 2 weeks ago

Facing same issue while python train.py config/train_shakespeare_char.py on colab gpu

Elrashid commented 2 weeks ago

@mishra011 @shenbb

I tried running the following code:

!pip install tiktoken
!git clone https://github.com/karpathy/nanoGPT.git
%cd nanoGPT
!python train.py config/train_shakespeare_char.py
!python sample.py --out_dir=out-shakespeare-char

Found a GPU Compatibility Issue:

If you're using the standard V100-SXM2-16GB GPU, you might face compatibility issues due to the limited memory and capabilities required by the model.

Recommendation (this worked for me, didn't have time to dig down further):

To avoid this error, upgrade to Colab Pro and ensure you select the A100-SXM4-40GB GPU in your runtime settings. This should resolve the issue and allow your model to train successfully.

lichengshen commented 1 week ago

The issue comes from torch.compile() so you can pass --compile=False as a workaround.

xinge449 commented 1 week ago

The issue comes from torch.compile() so you can pass --compile=False as a workaround.

在我这边成功了,非常感谢,Thinks a lot