turning on flip - Githubissues

mateuszwyszynski commented 6 months ago

Training the model using configs shared with the pretrained models:

data:
  amass_dir: ./amass_samples/
  data_dir: ./training_data/
  flip: true
  num_pts: 10000
  single: false
experiment:
  bodymodel: smpl
  data_name: PoseData
  exp_name: small
  inp_name: single
  num_part: 21
  root_dir: ./posendf/replicate-version2/
  test: false
  type: BaseTrainer
  val: false
model:
  DFNet:
    act: softplus
    beta: 100
    dims: 256, 512, 1024, 512, 256, 64
    ff_enc: false
    in_dim: 126
    name: DFNet
    num_layers: 5
    num_parts: 21
    total_dim: 960
  StrEnc:
    act: softplus
    beta: 100
    ff_enc: false
    in_dim: 84
    name: StructureEncoder
    num_layers: 2
    num_part: 21
    out_dim: 6
    pose_enc: false
    use: true
train:
  abs: true
  batch_size: 4
  body_enc: true
  clamp_dist: 0.0
  continue_train: true
  device: cuda
  disp_reg: true
  dist: 0.5
  eikonal: 0.1
  eval: false
  grad: false
  loss_type: l1
  man_loss: 0.1
  max_epoch: 200000
  num_worker: 4
  optimizer: Adam
  optimizer_param: 1.0e-05
  pde: false
  square: false
  train_stage_1: 100000
  train_stage_2: 100000

results in the following error:

Traceback (most recent call last):
  File "/home/mateusz.wyszynski/Code/PoseNDF/trainer.py", line 37, in <module>
    train(opt, args.config, args.test)
  File "/home/mateusz.wyszynski/Code/PoseNDF/trainer.py", line 21, in train
    loss,epoch_loss = trainer.train_model(i)
  File "/home/mateusz.wyszynski/Code/PoseNDF/model/train_posendf.py", line 90, in train_model
    for i, inputs in enumerate(self.train_dataset):
  File "/opt/conda/envs/posendf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/opt/conda/envs/posendf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/opt/conda/envs/posendf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/opt/conda/envs/posendf/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/envs/posendf/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/envs/posendf/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/envs/posendf/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mateusz.wyszynski/Code/PoseNDF/model/load_data.py", line 71, in __getitem__
    amass_poses, _  = quat_flip(amass_poses)
  File "/home/mateusz.wyszynski/Code/PoseNDF/model/load_data.py", line 15, in quat_flip
    is_neg = pose_in[:,:,0] <0
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

Setting the flip to false allows to start the training, so the mistake is most likely somehow caused by this functionality. Based on the paper, the flip is an additional information for the network that we always have two quaternions representing the same rotation (i.e. q and -q)

mateuszwyszynski commented 6 months ago

Ok, so the error is cause by the lines 70-71 in load_data.py.

I believe the poses sampled from the amass dataset are not represented as quaternions (one can see that the pose has shape 63 not 21 x 4). I've checked the original repo and in the original implementation the code is the same. I think we have to correct it on our own.

No idea how they obtained the original results though. A good question is how long they have been training the network? Maybe I should continue the training without the flip and wait until I get similar results.

mateuszwyszynski commented 6 months ago

Another thing is that I should probably use indices 3:66 to generate the pose. The position of the root is of no interest for us, but the orientation impacts the pose. This is how this problem is treated in vposer (line 102)

This should be revised with viser after #8 is closed

mateuszwyszynski commented 6 months ago

Another thing is that I should probably use indices 3:66 to generate the pose. The position of the root is of no interest for us, but the orientation impacts the pose. This is how this problem is treated in vposer (line 102)

This should be revised with viser after #8 is closed

Ok, so based on my preliminary observations using viser we should actually use indices 0:63 when dealing with samples generated from AMASS using data/sample_poses.py and indices 3:66 when dealing with the original AMASS raw data. Have to understand better what is the role of the remaining parameters, but wanted to save a note for future reference.

mateuszwyszynski / PoseNDF

turning on flip #11