Open OswaldoBornemann opened 1 year ago
Hi @PPGGG can you provide more log? like the line above the emd = torch.cat(emd_lst)
? what the command are you running?
@ZENGXH I just ran bash ./script/train_prior.sh 1
.
It seems the emd_batch
is a int instead of tensor: this is wired, the value is also much larger than I expected.
print(f'emd_batch: {emd_batch}; sample_batch: {sample_batch.shape} {ref_batch.shape}; batch_size={batch_size}; value range: {sample_batch.min()} {sample_batch.max()}; {ref_batch.min()} {ref_batch.max()}; ')
before line 213 of the utils/evaluation_metrics_fast.py
files (link). And paste the output here?
For example. in my case I get this following output
2023-04-01 16:10:31.379 | INFO | trainers.base_trainer:set_writer:57 -
----------
[url]: none
../exp/0401/car/6e880fh_train_lion_B10
----------
2023-04-01 16:10:31.384 | INFO | __main__:main:68 - not find any checkpoint: ../exp/0401/car/6e880fh_train_lion_B10/checkpoints, (exist=False), or snapshot ../exp/0401/car/6e880fh_train_lion_B10/checkpoints/snapshot, (exist=False)
2023-04-01 16:10:31.384 | INFO | trainers.base_trainer:train_epochs:173 - [rank=0] Start epoch: 0 End epoch: 18000, batch-size=10 | Niter/epo=245 | log freq=245, viz freq 49000, val freq -10000
2023-04-01 16:10:35.574 | INFO | utils.exp_helper:get_evalname:94 - git hash: 0ee19
2023-04-01 16:10:35.876 | INFO | trainers.base_trainer:eval_nll:744 - eval: 1/36
2023-04-01 16:10:40.085 | INFO | trainers.base_trainer:eval_nll:744 - eval: 31/36
emd_batch: tensor([7.3357e-05], device='cuda:0'); sample_batch: torch.Size([1, 2048, 3]) torch.Size([1, 2048, 3]); batch_size=1; value range: -0.4135565459728241 0.46055838465690613; -0.4139295518398285 0.46082451939582825;