microsoft / MeshGraphormer

Research code of ICCV 2021 paper "Mesh Graphormer"
https://arxiv.org/abs/2104.00272
397 stars 55 forks source link

how to reproduce the model as paper #18

Open jhkim0759 opened 2 years ago

jhkim0759 commented 2 years ago

how many epoch do I have to train to reproduce the model as paper I think, I have to use the pretrained model which trained on h36m dataset. right? then how many epochs do we need for fine tuning on 3dpw??

kevinlin311tw commented 2 years ago

In our experiments, we train our model on multiple datasets for 200 epochs. Then we test on h36m dataset. For 3dpw fine-tuning, we typically fine-tune it for 5 epochs.

jhkim0759 commented 2 years ago

Did you fine-tune with the same model as you provided in GitHub? I have a question because the same result did not come out when we experimented with the model you provided.

Thank you

kevinlin311tw commented 2 years ago

Thanks for the question.

The H36M model we provided is the best performing checkpoint among the entire training process (we train for 200 epochs, and we pick the best for H36M). For 3DPW fine-tuning, we fine-tune with the final checkpoint (trained at 200 epoch).

I couldn't retrieve the checkpoint file, but here I found the relevant log file. The log is noisy as we didn't clean the codebase during paper submission. In the log below, the metrics we are looking for should be mPVE_smp, mPJPE_smpl, PAmPJPE_smpl.

2021-03-05 01:07:05 [1,0]<stderr>:python -m torch.distributed.launch --nproc_per_node=8 tools/human_mesh/run_train_simplifiedmesh_2d3d_mvm_gridfeat.py --train_yaml 3dpw_backup/train.yaml --val_yaml 3dpw_backup/test_has_gender.yaml --arch hrnet-w64 --model_name_or_path models/captioning/bert-base-uncased/ --num_workers 2 --logging_steps 20 --resume_checkpoint _output/20210225_METRO_grid6_Tax-H36m-coco40k-Muco-UP-Mpii_arch.hrnet-w64.bert-L6_bs.25_hidl.4_head.4_lr.1e-4_ep.200_vloss.100.0_jloss.1000.0_isz.2051,512,128_hsz.1024,256,64_jregloss_full.0.33_sub.0.33_sub2.0.33_fc-up_2Djloss100_graph_intrsz-2x/checkpoint-200-476000/model.bin --per_gpu_train_batch_size 20 --per_gpu_eval_batch_size 20 --num_hidden_layers 4 --num_attention_heads 4 --lr 1e-4 --backbone_pretrained --fix_backbone 0 --object_query 1 --masking_inputs 0 --img_scale_factor 1 --scheduler iter_step --num_train_epochs 2 --input_feat_dim 2051,512,128 --hidden_feat_dim 1024,256,64 --vertices_loss_weight 100.0 --joints_loss_weight 1000.0 --vloss_w_full 0.33 --vloss_w_sub 0.33 --vloss_w_sub2 0.33 --graph_conv --which_gcn 0,0,1 --gresidual --GCNMHA 0 --intrsz 2
2021-03-05 01:07:05 [1,0]<stderr>:
2021-03-05 01:07:05 [1,0]<stdout>:*****************************************
2021-03-05 01:07:05 [1,0]<stdout>:Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
2021-03-05 01:07:05 [1,0]<stdout>:*****************************************
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl', 'PyOpenSSL'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl', 'PyOpenSSL'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl', 'PyOpenSSL'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl', 'PyOpenSSL'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl', 'PyOpenSSL'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl', 'PyOpenSSL'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stderr>:Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 3.1 (/miniconda/envs/py37/lib/python3.7/site-packages), Requirement.parse('cryptography>=3.2'), {'PyOpenSSL', 'pyopenssl'}).
2021-03-05 01:07:08 [1,0]<stdout>:Init distributed training on local rank 4
2021-03-05 01:07:08 [1,0]<stdout>:Init distributed training on local rank 0
2021-03-05 01:07:08 [1,0]<stdout>:Init distributed training on local rank 6
2021-03-05 01:07:08 [1,0]<stdout>:Init distributed training on local rank 5
2021-03-05 01:07:08 [1,0]<stdout>:Init distributed training on local rank 7
2021-03-05 01:07:08 [1,0]<stdout>:Init distributed training on local rank 3
2021-03-05 01:07:08 [1,0]<stdout>:Init distributed training on local rank 1
2021-03-05 01:07:08 [1,0]<stdout>:Init distributed training on local rank 2
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1006 [0] NCCL INFO Bootstrap : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1006 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-03-05 01:07:11 [1,0]<stdout>:
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1006 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1006 [0] NCCL INFO NET/Socket : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:NCCL version 2.4.8+cuda10.1
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1013 [7] NCCL INFO Bootstrap : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1007 [1] NCCL INFO Bootstrap : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1013 [7] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1007 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-03-05 01:07:11 [1,0]<stdout>:
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1013 [7] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
2021-03-05 01:07:11 [1,0]<stdout>:
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1007 [1] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1012 [6] NCCL INFO Bootstrap : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1008 [2] NCCL INFO Bootstrap : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1012 [6] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1013 [7] NCCL INFO NET/Socket : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1007 [1] NCCL INFO NET/Socket : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1008 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1009 [3] NCCL INFO Bootstrap : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1011 [5] NCCL INFO Bootstrap : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1012 [6] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1009 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-03-05 01:07:11 [1,0]<stdout>:
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1008 [2] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1011 [5] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1012 [6] NCCL INFO NET/Socket : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1009 [3] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
2021-03-05 01:07:11 [1,0]<stdout>:
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1011 [5] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1008 [2] NCCL INFO NET/Socket : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1009 [3] NCCL INFO NET/Socket : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1011 [5] NCCL INFO NET/Socket : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1010 [4] NCCL INFO Bootstrap : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1010 [4] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-03-05 01:07:11 [1,0]<stdout>:
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1010 [4] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1010 [4] NCCL INFO NET/Socket : Using [0]eth0:192.168.0.121<0>
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Setting affinity for GPU 0 to 0fffff
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Setting affinity for GPU 3 to 0fffff
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Setting affinity for GPU 2 to 0fffff
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Setting affinity for GPU 1 to 0fffff
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO NCCL_TREE_THRESHOLD set by environment to 0.
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 00 :    0   1   2   3   7   4   6   5
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 01 :    0   1   2   6   4   5   7   3
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 02 :    0   2   6   7   5   4   1   3
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 03 :    0   3   1   4   5   7   6   2
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 04 :    0   3   7   5   4   6   2   1
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 05 :    0   5   6   4   7   3   2   1
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 06 :    0   1   2   3   7   4   6   5
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 07 :    0   1   2   6   4   5   7   3
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 08 :    0   2   6   7   5   4   1   3
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 09 :    0   3   1   4   5   7   6   2
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 10 :    0   3   7   5   4   6   2   1
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Channel 11 :    0   5   6   4   7   3   2   1
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 00 : 4[1] -> 6[2] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 00 : 3[4] -> 7[3] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 00 : 5[7] -> 0[6] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 00 : 6[2] -> 5[7] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 00 : 2[5] -> 3[4] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 00 : 1[0] -> 2[5] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 00 : 7[3] -> 4[1] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 00 : 0[6] -> 1[0] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 01 : 4[1] -> 5[7] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 01 : 5[7] -> 7[3] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 01 : 3[4] -> 0[6] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 01 : 6[2] -> 4[1] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 01 : 1[0] -> 2[5] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 01 : 7[3] -> 3[4] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 01 : 0[6] -> 1[0] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 01 : 2[5] -> 6[2] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 02 : 1[0] -> 3[4] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 02 : 4[1] -> 1[0] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 02 : 3[4] -> 0[6] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 02 : 2[5] -> 6[2] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 02 : 6[2] -> 7[3] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 02 : 5[7] -> 4[1] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 02 : 7[3] -> 5[7] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 02 : 0[6] -> 2[5] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 03 : 1[0] -> 4[1] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 03 : 6[2] -> 2[5] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 03 : 4[1] -> 5[7] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 03 : 0[6] -> 3[4] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 03 : 5[7] -> 7[3] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 03 : 3[4] -> 1[0] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 03 : 2[5] -> 0[6] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 03 : 7[3] -> 6[2] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 04 : 5[7] -> 4[1] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 04 : 1[0] -> 0[6] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 04 : 4[1] -> 6[2] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 04 : 0[6] -> 3[4] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 04 : 6[2] -> 2[5] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 04 : 7[3] -> 5[7] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 04 : 3[4] -> 7[3] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 04 : 2[5] -> 1[0] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 05 : 5[7] -> 6[2] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 05 : 4[1] -> 7[3] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 05 : 6[2] -> 4[1] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 05 : 7[3] -> 3[4] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 05 : 3[4] -> 2[5] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 05 : 2[5] -> 1[0] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 05 : 1[0] -> 0[6] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 05 : 0[6] -> 5[7] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 06 : 4[1] -> 6[2] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 06 : 3[4] -> 7[3] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 06 : 5[7] -> 0[6] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 06 : 1[0] -> 2[5] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 06 : 0[6] -> 1[0] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 06 : 6[2] -> 5[7] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 06 : 7[3] -> 4[1] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 06 : 2[5] -> 3[4] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 07 : 4[1] -> 5[7] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 07 : 3[4] -> 0[6] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 07 : 5[7] -> 7[3] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 07 : 1[0] -> 2[5] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 07 : 0[6] -> 1[0] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 07 : 6[2] -> 4[1] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 07 : 7[3] -> 3[4] via P2P/IPC
2021-03-05 01:07:11 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 07 : 2[5] -> 6[2] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 08 : 4[1] -> 1[0] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 08 : 3[4] -> 0[6] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 08 : 5[7] -> 4[1] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 08 : 1[0] -> 3[4] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 08 : 0[6] -> 2[5] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 08 : 6[2] -> 7[3] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 08 : 7[3] -> 5[7] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 08 : 2[5] -> 6[2] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 09 : 4[1] -> 5[7] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 09 : 3[4] -> 1[0] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 09 : 5[7] -> 7[3] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 09 : 1[0] -> 4[1] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 09 : 0[6] -> 3[4] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 09 : 6[2] -> 2[5] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 09 : 7[3] -> 6[2] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 09 : 2[5] -> 0[6] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 10 : 4[1] -> 6[2] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 10 : 3[4] -> 7[3] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 10 : 5[7] -> 4[1] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 10 : 1[0] -> 0[6] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 10 : 0[6] -> 3[4] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 10 : 6[2] -> 2[5] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 10 : 7[3] -> 5[7] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 10 : 2[5] -> 1[0] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO Ring 11 : 4[1] -> 7[3] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO Ring 11 : 3[4] -> 2[5] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO Ring 11 : 5[7] -> 6[2] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO Ring 11 : 1[0] -> 0[6] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Ring 11 : 0[6] -> 5[7] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO Ring 11 : 6[2] -> 4[1] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO Ring 11 : 7[3] -> 3[4] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO Ring 11 : 2[5] -> 1[0] via P2P/IPC
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO Using 256 threads, Min Comp Cap 7, Trees disabled
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1010:1100 [4] NCCL INFO comm 0x7f2524002000 rank 4 nranks 8 cudaDev 4 nvmlDev 1 - Init COMPLETE
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1009:1095 [3] NCCL INFO comm 0x7f8754002000 rank 3 nranks 8 cudaDev 3 nvmlDev 4 - Init COMPLETE
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1007:1098 [1] NCCL INFO comm 0x7fe3a4002000 rank 1 nranks 8 cudaDev 1 nvmlDev 0 - Init COMPLETE
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1011:1097 [5] NCCL INFO comm 0x7f3bd4002000 rank 5 nranks 8 cudaDev 5 nvmlDev 7 - Init COMPLETE
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1093 [0] NCCL INFO comm 0x7f6d50002000 rank 0 nranks 8 cudaDev 0 nvmlDev 6 - Init COMPLETE
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1012:1096 [6] NCCL INFO comm 0x7f35c0002000 rank 6 nranks 8 cudaDev 6 nvmlDev 2 - Init COMPLETE
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1006:1006 [0] NCCL INFO Launch mode Parallel
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1008:1094 [2] NCCL INFO comm 0x7f9d30002000 rank 2 nranks 8 cudaDev 2 nvmlDev 5 - Init COMPLETE
2021-03-05 01:07:14 [1,0]<stdout>:az-wus2-v100-32gb-worker-wjvjpt:1013:1099 [7] NCCL INFO comm 0x7fec18002000 rank 7 nranks 8 cudaDev 7 nvmlDev 3 - Init COMPLETE
2021-03-05 01:07:14 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:14 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:14 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:14 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:14 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:14 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:14 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:14 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:14 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:14 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:14 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:14 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:14 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:14 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:14 [1,0]<stdout>:2021-03-05 01:07:11,994 Mesh regression INFO: Using 8 GPUs
2021-03-05 01:07:14 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:14 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-03-05 01:07:17 [1,0]<stderr>:  self._set_intXint(row, col, x.flat[0])
2021-03-05 01:07:17 [1,0]<stdout>:2021-03-05 01:07:17,043 Mesh regression INFO: Update config parameter num_hidden_layers: 12 -> 4
2021-03-05 01:07:17 [1,0]<stdout>:2021-03-05 01:07:17,120 Mesh regression INFO: Update config parameter num_attention_heads: 12 -> 4
2021-03-05 01:07:20 [1,0]<stdout>:2021-03-05 01:07:18,450 Mesh regression INFO: Init model from scratch.
2021-03-05 01:07:20 [1,0]<stdout>:2021-03-05 01:07:18,456 Mesh regression INFO: Update config parameter num_hidden_layers: 12 -> 4
2021-03-05 01:07:20 [1,0]<stdout>:2021-03-05 01:07:18,461 Mesh regression INFO: Update config parameter num_attention_heads: 12 -> 4
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:2021-03-05 01:07:18,666 Mesh regression INFO: Init model from scratch.
2021-03-05 01:07:20 [1,0]<stdout>:2021-03-05 01:07:18,672 Mesh regression INFO: Update config parameter num_hidden_layers: 12 -> 4
2021-03-05 01:07:20 [1,0]<stdout>:2021-03-05 01:07:18,680 Mesh regression INFO: Update config parameter num_attention_heads: 12 -> 4
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:Use BertLayerNorm in GraphResBlock
2021-03-05 01:07:20 [1,0]<stdout>:2021-03-05 01:07:18,736 Mesh regression INFO: Init model from scratch.
2021-03-05 01:07:23 [1,0]<stdout>:=> loading pretrained model models/hrnet/hrnetv2_w64_imagenet_pretrained.pth=> loading pretrained model models/hrnet/hrnetv2_w64_imagenet_pretrained.pth=> loading pretrained model models/hrnet/hrnetv2_w64_imagenet_pretrained.pth
2021-03-05 01:07:23 [1,0]<stdout>:2021-03-05 01:07:22,681 Mesh regression INFO: => loading hrnet-v2-w64 model
2021-03-05 01:07:23 [1,0]<stdout>:2021-03-05 01:07:22,688 Mesh regression INFO: Transformer Encoder 2 total parameters: 83318598
2021-03-05 01:07:23 [1,0]<stdout>:2021-03-05 01:07:22,697 Mesh regression INFO: Backbone model total parameters: 128059944
2021-03-05 01:07:23 [1,0]<stdout>:2021-03-05 01:07:22,702 Mesh regression INFO: Resume training with GCN modules: Loading from checkpoint _output/20210225_METRO_grid6_Tax-H36m-coco40k-Muco-UP-Mpii_arch.hrnet-w64.bert-L6_bs.25_hidl.4_head.4_lr.1e-4_ep.200_vloss.100.0_jloss.1000.0_isz.2051,512,128_hsz.1024,256,64_jregloss_full.0.33_sub.0.33_sub2.0.33_fc-up_2Djloss100_graph_intrsz-2x/checkpoint-200-476000/model.bin
2021-03-05 01:07:26 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'maskrcnn_benchmark.layers.bert.modeling_bert.BertLayer' has changed. Saved a reverse patch to BertLayer.patch. Run `patch -p0 < BertLayer.patch` to revert your changes.
2021-03-05 01:07:26 [1,0]<stderr>:  warnings.warn(msg, SourceChangeWarning)
2021-03-05 01:07:29 [1,0]<stdout>:2021-03-05 01:07:27,518 Mesh regression INFO: Training parameters Namespace(GCNMHA=0, adam_epsilon=1e-08, arch='hrnet-w64', backbone_pretrained=True, config_name='', data_dir='datasets', device=device(type='cuda'), distributed=True, do_lower_case=False, drop_out=0.1, effective_batch_size=-1, fix_backbone=0, graph_conv=True, gresidual=True, hidden_feat_dim='1024,256,64', hidden_size=-1, img_feature_dim=2051, img_scale_factor=1, input_feat_dim='2051,512,128', intermediate_size=-1, intrsz=2, joints_loss_weight=1000.0, learn_position_encoding=False, load_partial_weights=False, local_rank=0, logging_steps=20, lr=0.0001, mask_prob=0.15, mask_type='bidirectional', masking_inputs=0, max_masked_tokens=3, model_name_or_path='models/captioning/bert-base-uncased/', momentum=0.9, no_sort_by_conf=False, num_attention_heads=4, num_gpus=8, num_hidden_layers=4, num_train_epochs=2, num_workers=2, object_query=1, on_memory=False, output_dir='_keli/output/', per_gpu_eval_batch_size=20, per_gpu_train_batch_size=20, person_crop=False, resume_checkpoint='_output/20210225_METRO_grid6_Tax-H36m-coco40k-Muco-UP-Mpii_arch.hrnet-w64.bert-L6_bs.25_hidl.4_head.4_lr.1e-4_ep.200_vloss.100.0_jloss.1000.0_isz.2051,512,128_hsz.1024,256,64_jregloss_full.0.33_sub.0.33_sub2.0.33_fc-up_2Djloss100_graph_intrsz-2x/checkpoint-200-476000/model.bin', run_eval_only=False, save_steps=50000, scheduler='iter_step', seed=88, tokenizer_name='', train_yaml='3dpw_backup/train.yaml', val_yaml='3dpw_backup/test_has_gender.yaml', val_yaml2='imagenet2012/test.yaml', val_yaml3='imagenet2012/test.yaml', vertices_loss_weight=100.0, vloss_w_full=0.33, vloss_w_sub=0.33, vloss_w_sub2=0.33, warmup_steps=0, weight_decay=0.05, which_gcn='0,0,1')
2021-03-05 01:07:29 [1,0]<stdout>:3dpw_backup/train.yaml
2021-03-05 01:07:29 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'maskrcnn_benchmark.layers.bert.modeling_bert.BertLayer' has changed. Saved a reverse patch to BertLayer.patch. Run `patch -p0 < BertLayer.patch` to revert your changes.
2021-03-05 01:07:29 [1,0]<stderr>:  warnings.warn(msg, SourceChangeWarning)
2021-03-05 01:07:29 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'maskrcnn_benchmark.layers.bert.modeling_bert.BertLayer' has changed. Saved a reverse patch to BertLayer.patch. Run `patch -p0 < BertLayer.patch` to revert your changes.
2021-03-05 01:07:29 [1,0]<stderr>:  warnings.warn(msg, SourceChangeWarning)
2021-03-05 01:07:29 [1,0]<stdout>:3dpw_backup/test_has_gender.yaml
2021-03-05 01:07:29 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'maskrcnn_benchmark.layers.bert.modeling_bert.BertLayer' has changed. Saved a reverse patch to BertLayer.patch. Run `patch -p0 < BertLayer.patch` to revert your changes.
2021-03-05 01:07:29 [1,0]<stderr>:  warnings.warn(msg, SourceChangeWarning)
2021-03-05 01:07:29 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'maskrcnn_benchmark.layers.bert.modeling_bert.BertLayer' has changed. Saved a reverse patch to BertLayer.patch. Run `patch -p0 < BertLayer.patch` to revert your changes.
2021-03-05 01:07:29 [1,0]<stderr>:  warnings.warn(msg, SourceChangeWarning)
2021-03-05 01:07:29 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'maskrcnn_benchmark.layers.bert.modeling_bert.BertLayer' has changed. Saved a reverse patch to BertLayer.patch. Run `patch -p0 < BertLayer.patch` to revert your changes.
2021-03-05 01:07:29 [1,0]<stderr>:  warnings.warn(msg, SourceChangeWarning)
2021-03-05 01:07:29 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'maskrcnn_benchmark.layers.bert.modeling_bert.BertLayer' has changed. Saved a reverse patch to BertLayer.patch. Run `patch -p0 < BertLayer.patch` to revert your changes.
2021-03-05 01:07:29 [1,0]<stderr>:  warnings.warn(msg, SourceChangeWarning)
2021-03-05 01:07:29 [1,0]<stderr>:/miniconda/envs/py37/lib/python3.7/site-packages/torch/serialization.py:593: SourceChangeWarning: source code of class 'maskrcnn_benchmark.layers.bert.modeling_bert.BertLayer' has changed. Saved a reverse patch to BertLayer.patch. Run `patch -p0 < BertLayer.patch` to revert your changes.
2021-03-05 01:07:29 [1,0]<stderr>:  warnings.warn(msg, SourceChangeWarning)
2021-03-05 01:07:32 [1,0]<stdout>:3dpw_backup/train.yaml
2021-03-05 01:07:32 [1,0]<stdout>:3dpw_backup/test_has_gender.yaml
2021-03-05 01:07:32 [1,0]<stdout>:3dpw_backup/train.yaml
2021-03-05 01:07:32 [1,0]<stdout>:3dpw_backup/test_has_gender.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/train.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/test_has_gender.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/train.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/test_has_gender.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/train.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/train.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/test_has_gender.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/train.yaml
2021-03-05 01:07:35 [1,0]<stdout>:3dpw_backup/test_has_gender.2021-03-05 01:08:14 [1,0]<stderr>:INFO:Mesh regression:eta: 0:08:14 epoch: 0 iter: 20 max mem : 16761  loss: 10.4721, 2d joint loss: 0.0050, 3d joint loss: 0.0062, vertex loss: 0.0372, compute: 1.8739, data: 0.7603, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stderr>:INFO:Mesh regression:eta: 0:08:14 epoch: 0 iter: 20 max mem : 16761  loss: 9.9172, 2d joint loss: 0.0051, 3d joint loss: 0.0059, vertex loss: 0.0354, compute: 1.8743, data: 0.7617, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stderr>:INFO:Mesh regression:eta: 0:08:14 epoch: 0 iter: 20 max mem : 16761  loss: 10.9969, 2d joint loss: 0.0060, 3d joint loss: 0.0067, vertex loss: 0.0369, compute: 1.8742, data: 0.7538, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stderr>:INFO:Mesh regression:eta: 0:08:14 epoch: 0 iter: 20 max mem : 16761  loss: 10.6613, 2d joint loss: 0.0053, 3d joint loss: 0.0063, vertex loss: 0.0378, compute: 1.8742, data: 0.7604, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stderr>:INFO:Mesh regression:eta: 0:08:14 epoch: 0 iter: 20 max mem : 16761  loss: 11.1073, 2d joint loss: 0.0057, 3d joint loss: 0.0067, vertex loss: 0.0380, compute: 1.8747, data: 0.7550, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stderr>:INFO:Mesh regression:eta: 0:08:14 epoch: 0 iter: 20 max mem : 16761  loss: 9.9614, 2d joint loss: 0.0050, 3d joint loss: 0.0059, vertex loss: 0.0359, compute: 1.8746, data: 0.7600, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stdout>:2021-03-05 01:08:12,443 Mesh regression INFO: eta: 0:08:14 epoch: 0 iter: 20 max mem : 16754  loss: 10.8502, 2d joint loss: 0.0061, 3d joint loss: 0.0065, vertex loss: 0.0372, compute: 1.8746, data: 0.7497, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stderr>:INFO:Mesh regression:eta: 0:08:14 epoch: 0 iter: 20 max mem : 16761  loss: 10.5579, 2d joint loss: 0.0053, 3d joint loss: 0.0063, vertex loss: 0.0373, compute: 1.8748, data: 0.7616, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stderr>:INFO:Mesh regression:eta: 0:08:14 epoch: 0 iter: 20 max mem : 16754  loss: 10.8502, 2d joint loss: 0.0061, 3d joint loss: 0.0065, vertex loss: 0.0372, compute: 1.8746, data: 0.7497, lr: 0.000100
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:240: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:72: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:240: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:72: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:240: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:72: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:240: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:72: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:240: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:240: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:72: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:72: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:240: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:72: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:240: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:14 [1,0]<stderr>:/tmp/code/maskrcnn_benchmark/modeling/human_mesh/smpl_model/renderer.py:72: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
2021-03-05 01:08:14 [1,0]<stderr>:  if np.issubdtype(image.dtype, np.float):
2021-03-05 01:08:35 [1,0]<stderr>:INFO:Mesh regression:eta: 0:05:59 epoch: 0 iter: 40 max mem : 16761  loss: 8.3247, 2d joint loss: 0.0041, 3d joint loss: 0.0046, vertex loss: 0.0327, compute: 1.4744, data: 0.4387, lr: 0.000100
2021-03-05 01:08:35 [1,0]<stderr>:INFO:Mesh regression:eta: 0:05:59 epoch: 0 iter: 40 max mem : 16761  loss: 8.6119, 2d joint loss: 0.0046, 3d joint loss: 0.0049, vertex loss: 0.0328, compute: 1.4746, data: 0.4395, lr: 0.000100
2021-03-05 01:08:35 [1,0]<stderr>:INFO:Mesh regression:eta: 0:05:59 epoch: 0 iter: 40 max mem : 16761  loss: 8.4455, 2d joint loss: 0.0042, 3d joint loss: 0.0048, vertex loss: 0.0326, compute: 1.4748, data: 0.4369, lr: 0.000100
2021-03-05 01:08:35 [1,0]<stderr>:INFO:Mesh regression:eta: 0:05:59 epoch: 0 iter: 40 max mem : 16761  loss: 8.7200, 2d joint loss: 0.0045, 3d joint loss: 0.0050, vertex loss: 0.0327, compute: 1.4747, data: 0.4347, lr: 0.000100
2021-03-05 01:08:35 [1,0]<stderr>:INFO:Mesh regression:eta: 0:05:59 epoch: 0 iter: 40 max mem : 16761  loss: 8.5211, 2d joint loss: 0.0042, 3d joint loss: 0.0048, vertex loss: 0.0328, compute: 1.4748, data: 0.4394, lr: 0.000100
2021-03-05 01:08:35 [1,0]<stderr>:INFO:Mesh regression:eta: 0:05:59 epoch: 0 iter: 40 max mem : 16761  loss: 8.3127, 2d joint loss: 0.0043, 3d joint loss: 0.0047, vertex loss: 0.0318, compute: 1.4751, data: 0.4405, lr: 0.000100
2021-03-05 01:08:35 [1,0]<stderr>:INFO:Mesh regression:eta: 0:05:59 epoch: 0 iter: 40 max mem : 16761  loss: 8.1728, 2d joint loss: 0.0041, 3d joint loss: 0.0045, vertex loss: 0.0323, compute: 1.4751, data: 0.4381, lr: 0.000100
2021-03-05 01:08:35 [1,0]<stdout>:2021-03-05 01:08:33,953 Mesh regression INFO: eta: 0:05:59 epoch: 0 iter: 40 max mem : 16754  loss: 8.4237, 2d joint loss: 0.0044, 3d joint loss: 0.0048, vertex loss: 0.0323, compute: 1.4751, data: 0.4478, lr: 0.000100
2021-03-05 01:08:35 [1,0]<stderr>:INFO:Mesh regression:eta: 0:05:59 epoch: 0 iter: 40 max mem : 16754  loss: 8.4237, 2d joint loss: 0.0044, 3d joint loss: 0.0048, vertex loss: 0.0323, compute: 1.4751, data: 0.4478, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:59 epoch: 0 iter: 60 max mem : 16761  loss: 7.4146, 2d joint loss: 0.0038, 3d joint loss: 0.0040, vertex loss: 0.0300, compute: 1.3349, data: 0.3300, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:59 epoch: 0 iter: 60 max mem : 16761  loss: 7.7740, 2d joint loss: 0.0039, 3d joint loss: 0.0043, vertex loss: 0.0305, compute: 1.3349, data: 0.3267, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:59 epoch: 0 iter: 60 max mem : 16761  loss: 7.7493, 2d joint loss: 0.0040, 3d joint loss: 0.0043, vertex loss: 0.0304, compute: 1.3351, data: 0.3281, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:59 epoch: 0 iter: 60 max mem : 16761  loss: 7.5404, 2d joint loss: 0.0039, 3d joint loss: 0.0041, vertex loss: 0.0302, compute: 1.3351, data: 0.3261, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:59 epoch: 0 iter: 60 max mem : 16761  loss: 7.1539, 2d joint loss: 0.0034, 3d joint loss: 0.0038, vertex loss: 0.0297, compute: 1.3350, data: 0.3294, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:59 epoch: 0 iter: 60 max mem : 16761  loss: 7.0795, 2d joint loss: 0.0034, 3d joint loss: 0.0038, vertex loss: 0.0296, compute: 1.3351, data: 0.3286, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:59 epoch: 0 iter: 60 max mem : 16761  loss: 7.4704, 2d joint loss: 0.0039, 3d joint loss: 0.0041, vertex loss: 0.0296, compute: 1.3352, data: 0.3312, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stdout>:2021-03-05 01:08:55,057 Mesh regression INFO: eta: 0:04:59 epoch: 0 iter: 60 max mem : 16754  loss: 7.2094, 2d joint loss: 0.0036, 3d joint loss: 0.0039, vertex loss: 0.0295, compute: 1.3351, data: 0.3414, lr: 0.000100
2021-03-05 01:08:56 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:59 epoch: 0 iter: 60 max mem : 16754  loss: 7.2094, 2d joint loss: 0.0036, 3d joint loss: 0.0039, vertex loss: 0.0295, compute: 1.3351, data: 0.3414, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:17 epoch: 0 iter: 80 max mem : 16761  loss: 6.7187, 2d joint loss: 0.0034, 3d joint loss: 0.0035, vertex loss: 0.0283, compute: 1.2646, data: 0.2747, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:17 epoch: 0 iter: 80 max mem : 16761  loss: 6.3411, 2d joint loss: 0.0030, 3d joint loss: 0.0033, vertex loss: 0.0277, compute: 1.2646, data: 0.2733, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:17 epoch: 0 iter: 80 max mem : 16761  loss: 7.0145, 2d joint loss: 0.0036, 3d joint loss: 0.0038, vertex loss: 0.0286, compute: 1.2646, data: 0.2723, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:18 epoch: 0 iter: 80 max mem : 16761  loss: 6.8720, 2d joint loss: 0.0035, 3d joint loss: 0.0037, vertex loss: 0.0283, compute: 1.2647, data: 0.2732, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:18 epoch: 0 iter: 80 max mem : 16761  loss: 6.7484, 2d joint loss: 0.0035, 3d joint loss: 0.0036, vertex loss: 0.0279, compute: 1.2648, data: 0.2760, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:18 epoch: 0 iter: 80 max mem : 16761  loss: 6.8731, 2d joint loss: 0.0034, 3d joint loss: 0.0037, vertex loss: 0.0283, compute: 1.2648, data: 0.2717, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stdout>:2021-03-05 01:09:16,129 Mesh regression INFO: eta: 0:04:18 epoch: 0 iter: 80 max mem : 16754  loss: 6.8020, 2d joint loss: 0.0034, 3d joint loss: 0.0036, vertex loss: 0.0281, compute: 1.2647, data: 0.2878, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:18 epoch: 0 iter: 80 max mem : 16761  loss: 6.6905, 2d joint loss: 0.0032, 3d joint loss: 0.0036, vertex loss: 0.0281, compute: 1.2648, data: 0.2742, lr: 0.000100
2021-03-05 01:09:17 [1,0]<stderr>:INFO:Mesh regression:eta: 0:04:18 epoch: 0 iter: 80 max mem : 16754  loss: 6.8020, 2d joint loss: 0.0034, 3d joint loss: 0.0036, vertex loss: 0.0281, compute: 1.2647, data: 0.2878, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:45 epoch: 0 iter: 100 max mem : 16761  loss: 5.9506, 2d joint loss: 0.0029, 3d joint loss: 0.0030, vertex loss: 0.0266, compute: 1.2282, data: 0.2390, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:46 epoch: 0 iter: 100 max mem : 16761  loss: 6.4295, 2d joint loss: 0.0033, 3d joint loss: 0.0034, vertex loss: 0.0271, compute: 1.2283, data: 0.2403, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:46 epoch: 0 iter: 100 max mem : 16761  loss: 6.5698, 2d joint loss: 0.0034, 3d joint loss: 0.0035, vertex loss: 0.0274, compute: 1.2283, data: 0.2384, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:46 epoch: 0 iter: 100 max mem : 16761  loss: 6.1796, 2d joint loss: 0.0030, 3d joint loss: 0.0032, vertex loss: 0.0268, compute: 1.2283, data: 0.2409, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:46 epoch: 0 iter: 100 max mem : 16761  loss: 6.3648, 2d joint loss: 0.0032, 3d joint loss: 0.0033, vertex loss: 0.0270, compute: 1.2284, data: 0.2389, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:46 epoch: 0 iter: 100 max mem : 16761  loss: 6.2414, 2d joint loss: 0.0032, 3d joint loss: 0.0033, vertex loss: 0.0267, compute: 1.2284, data: 0.2428, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stdout>:2021-03-05 01:09:37,786 Mesh regression INFO: eta: 0:03:46 epoch: 0 iter: 100 max mem : 16754  loss: 6.2403, 2d joint loss: 0.0031, 3d joint loss: 0.0033, vertex loss: 0.0267, compute: 1.2284, data: 0.2607, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:46 epoch: 0 iter: 100 max mem : 16761  loss: 6.2097, 2d joint loss: 0.0031, 3d joint loss: 0.0032, vertex loss: 0.0270, compute: 1.2284, data: 0.2405, lr: 0.000100
2021-03-05 01:09:38 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:46 epoch: 0 iter: 100 max mem : 16754  loss: 6.2403, 2d joint loss: 0.0031, 3d joint loss: 0.0033, vertex loss: 0.0267, compute: 1.2284, data: 0.2607, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:16 epoch: 0 iter: 120 max mem : 16761  loss: 6.3077, 2d joint loss: 0.0035, 3d joint loss: 0.0033, vertex loss: 0.0264, compute: 1.1996, data: 0.2168, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:16 epoch: 0 iter: 120 max mem : 16761  loss: 5.8445, 2d joint loss: 0.0029, 3d joint loss: 0.0030, vertex loss: 0.0260, compute: 1.1996, data: 0.2185, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:16 epoch: 0 iter: 120 max mem : 16761  loss: 6.0391, 2d joint loss: 0.0031, 3d joint loss: 0.0031, vertex loss: 0.0261, compute: 1.1996, data: 0.2185, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:16 epoch: 0 iter: 120 max mem : 16761  loss: 5.5929, 2d joint loss: 0.0027, 3d joint loss: 0.0028, vertex loss: 0.0256, compute: 1.1996, data: 0.2170, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:16 epoch: 0 iter: 120 max mem : 16761  loss: 5.8950, 2d joint loss: 0.0030, 3d joint loss: 0.0030, vertex loss: 0.0257, compute: 1.1997, data: 0.2196, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:16 epoch: 0 iter: 120 max mem : 16761  loss: 5.7552, 2d joint loss: 0.0028, 3d joint loss: 0.0029, vertex loss: 0.0257, compute: 1.1996, data: 0.2178, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stdout>:2021-03-05 01:09:58,914 Mesh regression INFO: eta: 0:03:16 epoch: 0 iter: 120 max mem : 16754  loss: 5.9295, 2d joint loss: 0.0029, 3d joint loss: 0.0031, vertex loss: 0.0259, compute: 1.1997, data: 0.2379, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:16 epoch: 0 iter: 120 max mem : 16761  loss: 5.9314, 2d joint loss: 0.0029, 3d joint loss: 0.0030, vertex loss: 0.0259, compute: 1.1998, data: 0.2171, lr: 0.000100
2021-03-05 01:09:59 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:16 epoch: 0 iter: 120 max mem : 16754  loss: 5.9295, 2d joint loss: 0.0029, 3d joint loss: 0.0031, vertex loss: 0.0259, compute: 1.1997, data: 0.2379, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:49 epoch: 0 iter: 140 max mem : 16761  loss: 5.7167, 2d joint loss: 0.0029, 3d joint loss: 0.0029, vertex loss: 0.0252, compute: 1.1804, data: 0.2028, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:49 epoch: 0 iter: 140 max mem : 16761  loss: 5.7122, 2d joint loss: 0.0028, 3d joint loss: 0.0029, vertex loss: 0.0253, compute: 1.1804, data: 0.2016, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:49 epoch: 0 iter: 140 max mem : 16761  loss: 5.9553, 2d joint loss: 0.0034, 3d joint loss: 0.0031, vertex loss: 0.0255, compute: 1.1804, data: 0.2014, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:49 epoch: 0 iter: 140 max mem : 16761  loss: 5.5961, 2d joint loss: 0.0028, 3d joint loss: 0.0028, vertex loss: 0.0250, compute: 1.1805, data: 0.2041, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:49 epoch: 0 iter: 140 max mem : 16761  loss: 5.5369, 2d joint loss: 0.0027, 3d joint loss: 0.0028, vertex loss: 0.0250, compute: 1.1804, data: 0.2023, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stdout>:2021-03-05 01:10:20,209 Mesh regression INFO: eta: 0:02:49 epoch: 0 iter: 140 max mem : 16754  loss: 5.6506, 2d joint loss: 0.0028, 3d joint loss: 0.0029, vertex loss: 0.0251, compute: 1.1804, data: 0.2229, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:49 epoch: 0 iter: 140 max mem : 16761  loss: 5.3573, 2d joint loss: 0.0026, 3d joint loss: 0.0026, vertex loss: 0.0249, compute: 1.1805, data: 0.2013, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:49 epoch: 0 iter: 140 max mem : 16761  loss: 5.5179, 2d joint loss: 0.0027, 3d joint loss: 0.0027, vertex loss: 0.0250, compute: 1.1805, data: 0.2029, lr: 0.000100
2021-03-05 01:10:20 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:49 epoch: 0 iter: 140 max mem : 16754  loss: 5.6506, 2d joint loss: 0.0028, 3d joint loss: 0.0029, vertex loss: 0.0251, compute: 1.1804, data: 0.2229, lr: 0.000100
2021-03-05 01:12:05 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:05 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:05 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:05 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:05 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:05 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:05 [1,0]<stdout>:2021-03-05 01:12:03,930 Mesh regression INFO: Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:05 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:05 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 1  mPVE:  79.10, mPJPE:  74.53, mPVE_smpl:  87.78, mPJPE_smpl:  74.74, PAmPJPE_smpl:  45.61, Data Count: 35520.00
2021-03-05 01:12:08 [1,0]<stdout>:2021-03-05 01:12:07,219 Mesh regression INFO: Save checkpoint to _keli/output/checkpoint-1-142
2021-03-05 01:12:08 [1,0]<stderr>:INFO:Mesh regression:Save checkpoint to _keli/output/checkpoint-1-142
2021-03-05 01:12:23 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:43 epoch: 1 iter: 160 max mem : 16761  loss: 5.4640, 2d joint loss: 0.0028, 3d joint loss: 0.0027, vertex loss: 0.0245, compute: 1.8047, data: 0.8129, lr: 0.000010
2021-03-05 01:12:23 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:43 epoch: 1 iter: 160 max mem : 16761  loss: 5.7321, 2d joint loss: 0.0033, 3d joint loss: 0.0029, vertex loss: 0.0248, compute: 1.8047, data: 0.8117, lr: 0.000010
2021-03-05 01:12:23 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:43 epoch: 1 iter: 160 max mem : 16761  loss: 5.1118, 2d joint loss: 0.0025, 3d joint loss: 0.0024, vertex loss: 0.0242, compute: 1.8047, data: 0.8115, lr: 0.000010
2021-03-05 01:12:23 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:43 epoch: 1 iter: 160 max mem : 16761  loss: 5.2468, 2d joint loss: 0.0026, 3d joint loss: 0.0026, vertex loss: 0.0243, compute: 1.8047, data: 0.8130, lr: 0.000010
2021-03-05 01:12:23 [1,0]<stdout>:2021-03-05 01:12:23,703 Mesh regression INFO: eta: 0:03:43 epoch: 1 iter: 160 max mem : 16754  loss: 5.3593, 2d joint loss: 0.0026, 3d joint loss: 0.0027, vertex loss: 0.0243, compute: 1.8047, data: 0.8530, lr: 0.000010
2021-03-05 01:12:23 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:43 epoch: 1 iter: 160 max mem : 16761  loss: 5.5124, 2d joint loss: 0.0028, 3d joint loss: 0.0028, vertex loss: 0.0246, compute: 1.8048, data: 0.8118, lr: 0.000010
2021-03-05 01:12:23 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:43 epoch: 1 iter: 160 max mem : 16761  loss: 5.3646, 2d joint loss: 0.0027, 3d joint loss: 0.0027, vertex loss: 0.0244, compute: 1.8047, data: 0.8126, lr: 0.000010
2021-03-05 01:12:23 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:43 epoch: 1 iter: 160 max mem : 16761  loss: 5.3213, 2d joint loss: 0.0027, 3d joint loss: 0.0026, vertex loss: 0.0242, compute: 1.8048, data: 0.8141, lr: 0.000010
2021-03-05 01:12:23 [1,0]<stderr>:INFO:Mesh regression:eta: 0:03:43 epoch: 1 iter: 160 max mem : 16754  loss: 5.3593, 2d joint loss: 0.0026, 3d joint loss: 0.0027, vertex loss: 0.0243, compute: 1.8047, data: 0.8530, lr: 0.000010
2021-03-05 01:12:44 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:59 epoch: 1 iter: 180 max mem : 16761  loss: 5.2339, 2d joint loss: 0.0027, 3d joint loss: 0.0026, vertex loss: 0.0239, compute: 1.7220, data: 0.7347, lr: 0.000010
2021-03-05 01:12:44 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:59 epoch: 1 iter: 180 max mem : 16761  loss: 5.4472, 2d joint loss: 0.0031, 3d joint loss: 0.0027, vertex loss: 0.0241, compute: 1.7219, data: 0.7338, lr: 0.000010
2021-03-05 01:12:44 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:59 epoch: 1 iter: 180 max mem : 16761  loss: 5.0447, 2d joint loss: 0.0025, 3d joint loss: 0.0024, vertex loss: 0.0237, compute: 1.7219, data: 0.7349, lr: 0.000010
2021-03-05 01:12:44 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:59 epoch: 1 iter: 180 max mem : 16761  loss: 5.3026, 2d joint loss: 0.0027, 3d joint loss: 0.0026, vertex loss: 0.0240, compute: 1.7220, data: 0.7337, lr: 0.000010
2021-03-05 01:12:44 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:59 epoch: 1 iter: 180 max mem : 16761  loss: 4.9333, 2d joint loss: 0.0024, 3d joint loss: 0.0023, vertex loss: 0.0236, compute: 1.7220, data: 0.7332, lr: 0.000010
2021-03-05 01:12:44 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:59 epoch: 1 iter: 180 max mem : 16761  loss: 5.1090, 2d joint loss: 0.0026, 3d joint loss: 0.0025, vertex loss: 0.0236, compute: 1.7220, data: 0.7360, lr: 0.000010
2021-03-05 01:12:44 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:59 epoch: 1 iter: 180 max mem : 16761  loss: 5.1131, 2d joint loss: 0.0025, 3d joint loss: 0.0025, vertex loss: 0.0237, compute: 1.7220, data: 0.7345, lr: 0.000010
2021-03-05 01:12:44 [1,0]<stdout>:2021-03-05 01:12:44,910 Mesh regression INFO: eta: 0:02:59 epoch: 1 iter: 180 max mem : 16754  loss: 5.1689, 2d joint loss: 0.0025, 3d joint loss: 0.0025, vertex loss: 0.0237, compute: 1.7220, data: 0.7724, lr: 0.000010
2021-03-05 01:12:47 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:59 epoch: 1 iter: 180 max mem : 16754  loss: 5.1689, 2d joint loss: 0.0025, 3d joint loss: 0.0025, vertex loss: 0.0237, compute: 1.7220, data: 0.7724, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:19 epoch: 1 iter: 200 max mem : 16761  loss: 5.2699, 2d joint loss: 0.0029, 3d joint loss: 0.0026, vertex loss: 0.0236, compute: 1.6557, data: 0.6712, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:19 epoch: 1 iter: 200 max mem : 16761  loss: 5.1135, 2d joint loss: 0.0026, 3d joint loss: 0.0025, vertex loss: 0.0234, compute: 1.6558, data: 0.6712, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:19 epoch: 1 iter: 200 max mem : 16761  loss: 4.8783, 2d joint loss: 0.0024, 3d joint loss: 0.0023, vertex loss: 0.0232, compute: 1.6557, data: 0.6724, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:19 epoch: 1 iter: 200 max mem : 16761  loss: 4.9501, 2d joint loss: 0.0025, 3d joint loss: 0.0024, vertex loss: 0.0231, compute: 1.6558, data: 0.6735, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:19 epoch: 1 iter: 200 max mem : 16761  loss: 5.0624, 2d joint loss: 0.0026, 3d joint loss: 0.0025, vertex loss: 0.0234, compute: 1.6558, data: 0.6722, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:19 epoch: 1 iter: 200 max mem : 16761  loss: 4.7559, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0230, compute: 1.6558, data: 0.6706, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:19 epoch: 1 iter: 200 max mem : 16761  loss: 4.9048, 2d joint loss: 0.0024, 3d joint loss: 0.0024, vertex loss: 0.0231, compute: 1.6558, data: 0.6719, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stdout>:2021-03-05 01:13:06,105 Mesh regression INFO: eta: 0:02:19 epoch: 1 iter: 200 max mem : 16754  loss: 4.9665, 2d joint loss: 0.0025, 3d joint loss: 0.0024, vertex loss: 0.0231, compute: 1.6558, data: 0.7077, lr: 0.000010
2021-03-05 01:13:08 [1,0]<stderr>:INFO:Mesh regression:eta: 0:02:19 epoch: 1 iter: 200 max mem : 16754  loss: 4.9665, 2d joint loss: 0.0025, 3d joint loss: 0.0024, vertex loss: 0.0231, compute: 1.6558, data: 0.7077, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:42 epoch: 1 iter: 220 max mem : 16761  loss: 5.0858, 2d joint loss: 0.0028, 3d joint loss: 0.0025, vertex loss: 0.0231, compute: 1.6010, data: 0.6201, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:42 epoch: 1 iter: 220 max mem : 16761  loss: 4.8876, 2d joint loss: 0.0025, 3d joint loss: 0.0023, vertex loss: 0.0229, compute: 1.6011, data: 0.6211, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:42 epoch: 1 iter: 220 max mem : 16761  loss: 4.6442, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0226, compute: 1.6011, data: 0.6194, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:42 epoch: 1 iter: 220 max mem : 16761  loss: 4.7437, 2d joint loss: 0.0024, 3d joint loss: 0.0022, vertex loss: 0.0226, compute: 1.6011, data: 0.6206, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:42 epoch: 1 iter: 220 max mem : 16761  loss: 4.9589, 2d joint loss: 0.0026, 3d joint loss: 0.0024, vertex loss: 0.0230, compute: 1.6011, data: 0.6200, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:42 epoch: 1 iter: 220 max mem : 16761  loss: 4.7224, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0227, compute: 1.6011, data: 0.6212, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stdout>:2021-03-05 01:13:27,190 Mesh regression INFO: eta: 0:01:42 epoch: 1 iter: 220 max mem : 16754  loss: 4.8100, 2d joint loss: 0.0024, 3d joint loss: 0.0023, vertex loss: 0.0227, compute: 1.6011, data: 0.6547, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:42 epoch: 1 iter: 220 max mem : 16761  loss: 4.8105, 2d joint loss: 0.0024, 3d joint loss: 0.0023, vertex loss: 0.0227, compute: 1.6012, data: 0.6222, lr: 0.000010
2021-03-05 01:13:29 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:42 epoch: 1 iter: 220 max mem : 16754  loss: 4.8100, 2d joint loss: 0.0024, 3d joint loss: 0.0023, vertex loss: 0.0227, compute: 1.6011, data: 0.6547, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:08 epoch: 1 iter: 240 max mem : 16761  loss: 4.7199, 2d joint loss: 0.0024, 3d joint loss: 0.0022, vertex loss: 0.0224, compute: 1.5562, data: 0.5780, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:08 epoch: 1 iter: 240 max mem : 16761  loss: 4.9379, 2d joint loss: 0.0027, 3d joint loss: 0.0024, vertex loss: 0.0227, compute: 1.5562, data: 0.5774, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:08 epoch: 1 iter: 240 max mem : 16761  loss: 4.5938, 2d joint loss: 0.0023, 3d joint loss: 0.0021, vertex loss: 0.0223, compute: 1.5562, data: 0.5786, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:08 epoch: 1 iter: 240 max mem : 16761  loss: 4.5208, 2d joint loss: 0.0022, 3d joint loss: 0.0021, vertex loss: 0.0223, compute: 1.5562, data: 0.5768, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:08 epoch: 1 iter: 240 max mem : 16761  loss: 4.8245, 2d joint loss: 0.0025, 3d joint loss: 0.0023, vertex loss: 0.0225, compute: 1.5562, data: 0.5775, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:08 epoch: 1 iter: 240 max mem : 16761  loss: 4.6397, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0223, compute: 1.5562, data: 0.5775, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:08 epoch: 1 iter: 240 max mem : 16761  loss: 4.7096, 2d joint loss: 0.0024, 3d joint loss: 0.0022, vertex loss: 0.0224, compute: 1.5562, data: 0.5795, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stdout>:2021-03-05 01:13:48,441 Mesh regression INFO: eta: 0:01:08 epoch: 1 iter: 240 max mem : 16754  loss: 4.6961, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0223, compute: 1.5562, data: 0.6105, lr: 0.000010
2021-03-05 01:13:50 [1,0]<stderr>:INFO:Mesh regression:eta: 0:01:08 epoch: 1 iter: 240 max mem : 16754  loss: 4.6961, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0223, compute: 1.5562, data: 0.6105, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:36 epoch: 1 iter: 260 max mem : 16761  loss: 4.5931, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0221, compute: 1.5189, data: 0.5418, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:36 epoch: 1 iter: 260 max mem : 16761  loss: 4.8726, 2d joint loss: 0.0028, 3d joint loss: 0.0024, vertex loss: 0.0224, compute: 1.5189, data: 0.5409, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:36 epoch: 1 iter: 260 max mem : 16761  loss: 4.5898, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0220, compute: 1.5190, data: 0.5434, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:36 epoch: 1 iter: 260 max mem : 16761  loss: 4.4805, 2d joint loss: 0.0022, 3d joint loss: 0.0021, vertex loss: 0.0220, compute: 1.5189, data: 0.5425, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:36 epoch: 1 iter: 260 max mem : 16761  loss: 4.6808, 2d joint loss: 0.0024, 3d joint loss: 0.0022, vertex loss: 0.0222, compute: 1.5190, data: 0.5410, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:36 epoch: 1 iter: 260 max mem : 16761  loss: 4.4236, 2d joint loss: 0.0022, 3d joint loss: 0.0020, vertex loss: 0.0219, compute: 1.5190, data: 0.5403, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:36 epoch: 1 iter: 260 max mem : 16761  loss: 4.5598, 2d joint loss: 0.0023, 3d joint loss: 0.0021, vertex loss: 0.0220, compute: 1.5189, data: 0.5413, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stdout>:2021-03-05 01:14:09,888 Mesh regression INFO: eta: 0:00:36 epoch: 1 iter: 260 max mem : 16754  loss: 4.5797, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0220, compute: 1.5190, data: 0.5740, lr: 0.000010
2021-03-05 01:14:11 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:36 epoch: 1 iter: 260 max mem : 16754  loss: 4.5797, 2d joint loss: 0.0023, 3d joint loss: 0.0022, vertex loss: 0.0220, compute: 1.5190, data: 0.5740, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:05 epoch: 1 iter: 280 max mem : 16761  loss: 4.5162, 2d joint loss: 0.0024, 3d joint loss: 0.0021, vertex loss: 0.0217, compute: 1.4862, data: 0.5104, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:05 epoch: 1 iter: 280 max mem : 16761  loss: 4.7799, 2d joint loss: 0.0027, 3d joint loss: 0.0023, vertex loss: 0.0221, compute: 1.4862, data: 0.5099, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:05 epoch: 1 iter: 280 max mem : 16761  loss: 4.4251, 2d joint loss: 0.0022, 3d joint loss: 0.0020, vertex loss: 0.0217, compute: 1.4862, data: 0.5111, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:05 epoch: 1 iter: 280 max mem : 16761  loss: 4.5110, 2d joint loss: 0.0023, 3d joint loss: 0.0021, vertex loss: 0.0218, compute: 1.4862, data: 0.5109, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:05 epoch: 1 iter: 280 max mem : 16761  loss: 4.4880, 2d joint loss: 0.0023, 3d joint loss: 0.0021, vertex loss: 0.0218, compute: 1.4863, data: 0.5119, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:05 epoch: 1 iter: 280 max mem : 16761  loss: 4.5848, 2d joint loss: 0.0024, 3d joint loss: 0.0022, vertex loss: 0.0219, compute: 1.4863, data: 0.5102, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:05 epoch: 1 iter: 280 max mem : 16761  loss: 4.3515, 2d joint loss: 0.0022, 3d joint loss: 0.0020, vertex loss: 0.0217, compute: 1.4862, data: 0.5094, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stdout>:2021-03-05 01:14:31,100 Mesh regression INFO: eta: 0:00:05 epoch: 1 iter: 280 max mem : 16754  loss: 4.4844, 2d joint loss: 0.0022, 3d joint loss: 0.0021, vertex loss: 0.0217, compute: 1.4862, data: 0.5420, lr: 0.000010
2021-03-05 01:14:33 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:05 epoch: 1 iter: 280 max mem : 16754  loss: 4.4844, 2d joint loss: 0.0022, 3d joint loss: 0.0021, vertex loss: 0.0217, compute: 1.4862, data: 0.5420, lr: 0.000010
2021-03-05 01:14:39 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:00 epoch: 2 iter: 284 max mem : 16761  loss: 4.4893, 2d joint loss: 0.0023, 3d joint loss: 0.0021, vertex loss: 0.0218, compute: 1.4864, data: 0.5111, lr: 0.000001
2021-03-05 01:14:39 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:00 epoch: 2 iter: 284 max mem : 16761  loss: 4.7788, 2d joint loss: 0.0027, 3d joint loss: 0.0023, vertex loss: 0.0221, compute: 1.4864, data: 0.5100, lr: 0.000001
2021-03-05 01:14:39 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:00 epoch: 2 iter: 284 max mem : 16761  loss: 4.4088, 2d joint loss: 0.0022, 3d joint loss: 0.0020, vertex loss: 0.0217, compute: 1.4864, data: 0.5113, lr: 0.000001
2021-03-05 01:14:39 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:00 epoch: 2 iter: 284 max mem : 16761  loss: 4.5896, 2d joint loss: 0.0024, 3d joint loss: 0.0022, vertex loss: 0.0219, compute: 1.4864, data: 0.5104, lr: 0.000001
2021-03-05 01:14:39 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:00 epoch: 2 iter: 284 max mem : 16761  loss: 4.5097, 2d joint loss: 0.0024, 3d joint loss: 0.0021, vertex loss: 0.0217, compute: 1.4864, data: 0.5106, lr: 0.000001
2021-03-05 01:14:39 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:00 epoch: 2 iter: 284 max mem : 16761  loss: 4.3320, 2d joint loss: 0.0021, 3d joint loss: 0.0020, vertex loss: 0.0216, compute: 1.4864, data: 0.5096, lr: 0.000001
2021-03-05 01:14:39 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:00 epoch: 2 iter: 284 max mem : 16761  loss: 4.4717, 2d joint loss: 0.0022, 3d joint loss: 0.0021, vertex loss: 0.0217, compute: 1.4865, data: 0.5121, lr: 0.000001
2021-03-05 01:14:39 [1,0]<stdout>:2021-03-05 01:14:37,102 Mesh regression INFO: eta: 0:00:00 epoch: 2 iter: 284 max mem : 16754  loss: 4.4999, 2d joint loss: 0.0022, 3d joint loss: 0.0021, vertex loss: 0.0217, compute: 1.4864, data: 0.5427, lr: 0.000001
2021-03-05 01:14:39 [1,0]<stderr>:INFO:Mesh regression:eta: 0:00:00 epoch: 2 iter: 284 max mem : 16754  loss: 4.4999, 2d joint loss: 0.0022, 3d joint loss: 0.0021, vertex loss: 0.0217, compute: 1.4864, data: 0.5427, lr: 0.000001
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stdout>:2021-03-05 01:15:59,925 Mesh regression INFO: Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Total training time: 0:08:25.049139 (1.7783 s / iter)
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Total training time: 0:08:25.058300 (1.7784 s / iter)
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Total training time: 0:08:25.063693 (1.7784 s / iter)
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Total training time: 0:08:25.060225 (1.7784 s / iter)
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Validation epoch: 2  mPVE:  78.44, mPJPE:  74.00, mPVE_smpl:  87.57, mPJPE_smpl:  73.98, PAmPJPE_smpl:  45.35, Data Count: 35520.00
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Total training time: 0:08:25.068628 (1.7784 s / iter)
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Total training time: 0:08:25.093398 (1.7785 s / iter)
2021-03-05 01:16:00 [1,0]<stderr>:INFO:Mesh regression:Total training time: 0:08:25.105339 (1.7785 s / iter)
2021-03-05 01:16:06 [1,0]<stdout>:2021-03-05 01:16:04,012 Mesh regression INFO: Save checkpoint to _keli/output/checkpoint-2-284
2021-03-05 01:16:06 [1,0]<stderr>:INFO:Mesh regression:Save checkpoint to _keli/output/checkpoint-2-284
2021-03-05 01:16:06 [1,0]<stdout>:2021-03-05 01:16:04,081 Mesh regression INFO: Total training time: 0:08:29.132428 (1.7927 s / iter)
2021-03-05 01:16:06 [1,0]<stderr>:INFO:Mesh regression:Total training time: 0:08:29.132428 (1.7927 s / iter)
2021-03-05 01:16:09 [1,0]<stdout>:2021-03-05 01:16:07,723 Mesh regression INFO: Save checkpoint to _keli/output/checkpoint-2-284
2021-03-05 01:16:09 [1,0]<stderr>:INFO:Mesh regression:Save checkpoint to _keli/output/checkpoint-2-284
2021-03-05 01:16:09 [1,0]<stdout>:Starting the daemon thread to refresh tokens in background for process with pid = 576
2021-03-05 01:16:12 [1,0]<stdout>:
2021-03-05 01:16:12 [1,0]<stdout>:
2021-03-05 01:16:12 [1,0]<stdout>:The experiment completed successfully. Finalizing run...
jhkim0759 commented 2 years ago

Now it's clear.

Thank you for your answer.

And I have one more question Do you think batch size affects model performance?

Because there was a big difference in performance between 25 batches on the 24G memory GPU and 10 batches on the 12G memory GPU.

I couldn't increase the batch size due to environmental problems, but I wonder if I can get better performance if it increases to more than 30.

kevinlin311tw commented 2 years ago

Yes. In my experience, increasing to batch size 32 can bring a small improvement.

I also tried increasing it to even larger size (like 40, 48), but no significant improvements observed.

jhkim0759 commented 2 years ago

Thank you for your great help