romainloiseau / LearnableEarthParser

Official Pytorch implementation of the "Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans" paper
https://imagine.enpc.fr/~loiseaur/learnable-earth-parser
MIT License
56 stars 4 forks source link

bug when training #2

Open yqf2000119 opened 4 months ago

yqf2000119 commented 4 months ago

Hi! I meet some bugs when i start the training, the whole message is: [WARNING]your gpu arch (8, 9) isn't compiled in prebuilt, may cause invalid device function. available: {(6, 1), (7, 0), (8, 0), (8, 6), (6, 0), (7, 5), (5, 2)}

Error executing job with overrides: ['+experiment=urban'] Traceback (most recent call last): File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run results = self._run_stage() File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage return self._run_train() File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train self.fit_loop.run() File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, *kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(args, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance batch_output = self.batch_loop.run(batch, batch_idx) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, *kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance result = self._run_optimization( File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization self._optimizer_step(optimizer, opt_idx, batch_idx, closure) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step self.trainer._call_lightning_module_hook( File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1595, in _call_lightning_module_hook output = fn(args, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1646, in optimizer_step optimizer.step(closure=optimizer_closure) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step return optimizer.step(closure=closure, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(*args, *kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/optim/adamw.py", line 100, in step loss = closure() File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure closure_result = closure() File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in call self._result = self.closure(*args, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure step_output = self._step_fn() File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step training_step_output = self.trainer._call_strategy_hook("training_step", step_kwargs.values()) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook output = fn(args, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 333, in training_step return self.model.training_step(*args, kwargs) File "/home/glose/LearnableEarthParser-main/learnableearthparser/model/base.py", line 371, in training_step return self.do_step(batch, batch_idx, 'train') File "/home/glose/LearnableEarthParser-main/learnableearthparser/model/base.py", line 153, in do_step out = self.forward(batch, tag, batch_size=batch_size, batch_idx=batch_idx) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/contextlib.py", line 79, in inner return func(*args, *kwds) File "/home/glose/LearnableEarthParser-main/learnableearthparser/model/base.py", line 199, in forward self.compute_reconstruction_loss(tag, batch, batch_size, out, protos, proto_slab, None, batch_idx=batch_idx) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/contextlib.py", line 79, in inner return func(args, kwds) File "/home/glose/LearnableEarthParser-main/learnableearthparser/model/ours.py", line 132, in compute_reconstruction_loss out["l_XP"] = compute_l_XP(out["kappa_presoftmax"], out["choice_L"], cham_x, x_lengths_LK, self.hparams.S, self.hparams.K) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

define NAN __int_as_float(0x7fffffff)

define POS_INFINITY __int_as_float(0x7f800000)

define NEG_INFINITY __int_as_float(0xff800000)

template device T maximum(T a, T b) { return isnan(a) ? a : (a > b ? a : b); }

template device T minimum(T a, T b) { return isnan(a) ? a : (a < b ? a : b); }

extern "C" global void fused_subexp(float* tv, float tv__, float aten_exp) { { float v = ldg(tv_ + (((long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)) % 6ll + 448ll (((long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)) / 2304ll)) + 7ll ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 36ll) % 64ll)); float v_1 = ldg(tv__ + ((((long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)) / 6ll) % 6ll + 448ll (((long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)) / 2304ll)) + 7ll ((((long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)) / 36ll) % 64ll)); aten_exp[(long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)] = expf(v - v_1); } }

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/glose/LearnableEarthParser-main/main.py", line 42, in main getattr(trainer, cfg.mode)(model, datamodule=datamodule) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit self._call_and_handle_interrupt( File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 737, in _call_and_handle_interrupt self._call_callback_hooks("on_exception", exception) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks fn(self, self.lightning_module, *args, kwargs) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/home/glose/LearnableEarthParser-main/learnableearthparser/callbacks/outhtml.py", line 730, in on_exception self.do_out_html(trainer, pl_module, "on_exception", "on_exception") File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/home/glose/LearnableEarthParser-main/learnableearthparser/callbacks/outhtml.py", line 717, in do_out_html html += "\n" + self.get_title(trainer, pl_module) + self.get_body(trainer, pl_module, title) + "\n" File "/home/glose/LearnableEarthParser-main/learnableearthparser/callbacks/outhtml.py", line 696, in get_body body = "


" + self.add_text("h2", title) + self.get_metrics(trainer, pl_module) + self.get_inferences(trainer, pl_module) File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/glose/LearnableEarthParser-main/learnableearthparser/callbacks/outhtml.py", line 300, in get_inferences assert out_recs.shape[0] == color.shape[0] AssertionErro. my GPU is GTX-4060ti with 16GB VRAM,My system is ubuntun 22.04 . how can i solve this problem?

romainloiseau commented 4 months ago

I never had this error before .. Can you ensure that CUDA and cuDNN are correctly installed and configured on your system? CUDA 11.8 or newer should be appropriate.

yqf2000119 commented 4 months ago

I never had this error before .. Can you ensure that CUDA and cuDNN are correctly installed and configured on your system? CUDA 11.8 or newer should be appropriate.

HI,thanks to your reply. I am sure I had installed the CUDA and cuDNN and it's ok with other projects, that makes me more confused.... and my CUDA is 11.8