syguan96 / DynaBOA

[T-PAMI 2022] Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation
225 stars 19 forks source link

Do i need h36m data to run inference on internet data? #5

Closed ChristianIngwersen closed 3 years ago

ChristianIngwersen commented 3 years ago

Hi,

Thanks for your great work! Do I really need to download the entire h36m dataset in order to run the demo on internet data?

Following your guide I run into issues in the lower level adaptation that takes a h36m batch. Is this on purpose or should it be changed?

Here lower_level_loss, _ = self.lower_level_adaptation(image, gt_keypoints_2d, h36m_batch, learner)

syguan96 commented 3 years ago

Hi @ChristianIngwersen, thanks for your interest! You can set lower_level_mixtrain=0 and upper_level_mixtrain=0 to disable co-training with Human3.6M. But to achieve the best performance, I suggest you download the whole Human 3.6M training set.

ChristianIngwersen commented 3 years ago

Hi @ChristianIngwersen, thanks for your interest! You can set lower_level_mixtrain=0 and upper_level_mixtrain=0 to disable co-training with Human3.6M. But to achieve the best performance, I suggest you download the whole Human 3.6M training set.

Thanks for the quick reply @syguan96 ! After doing so and also adding --retrieval 0 to avoid it from loading h36m I get the following error. Is it something you've seen before? error

syguan96 commented 3 years ago

I tried using this command, and it works well.

CUDA_VISIBLE_DEVICES=1 python dynaboa_internet.py --expdir exps --expname internet --dataset internet \
                                            --motionloss_weight 0.8 \
                                            --retrieval 0 \
                                            --dynamic_boa 1 \
                                            --optim_steps 7 \
                                            --cos_sim_threshold 3.1e-4 \
                                            --shape_prior_weight 2e-4 \
                                            --pose_prior_weight 1e-4 \
                                            --save_res 1 \
                                            --lower_level_mixtrain 0 \
                                            --upper_level_mixtrain 0

Have you changed the code? From the error, check this code self.model = l2l.algorithms.MAML(model, lr=self.options.fastlr, first_order=True).to(self.device) , see if you set requires_grad = False for some variables in model.

ChristianIngwersen commented 3 years ago

Haven't changed anything. and you're sure you are on v 0.1.5 of learn2learn?

ChristianIngwersen commented 3 years ago

Modified the adaptation step to check for gradients and it passed the assertions but chrashes with the same error when calling learner.adapt(lower_level_loss)

Modifications:

  # step 1, clone model
  for param in self.model.parameters():
      assert param.requires_grad
  learner = self.model.clone()
  for param in learner.parameters():
      assert param.requires_grad
  # step 2, lower probe
  for i in range(self.options.inner_step):
      lower_level_loss, _ = self.lower_level_adaptation(image, gt_keypoints_2d, h36m_batch, learner)
      learner.adapt(lower_level_loss)
syguan96 commented 3 years ago

I just notice that the problem might be caused by the installed PyTorch. Have you checked the reason for causing the CUDA error?

syguan96 commented 3 years ago

My environment is Pytorch 1.8.1 with CUDA 11.1+ and test on 3080. The version of learn2learn is also 0.1.5

ChristianIngwersen commented 3 years ago

I'm on Pytorch 1.8.2 with CUDA 11.1+ and test on a 2080

I completely followed the guide to set up a new env with 1.8.2 as mentioned in the readme. Can try to downgrade and see if it will fix it :)

syguan96 commented 3 years ago

Sorry, I didn't check this detail carefully.

ChristianIngwersen commented 3 years ago

No worries! :) While downgrading,

In the alphapose step you suggest to use: python scripts/demo_inference.py --indir $IMAGES_DIR --outdir $RES_DIR --cfg configs/coco/resnet/256x192_res152_lr1e-3_1x-duc.yaml --checkpoint pretrained_models/fast_421_res152_256x192.pth --save_video --save_img --flip --min_box_area 300

This will run on one video at a time right? I ran it on a single vid, and then changed the Internet_dataset(Dataset) to just look for npz as it else woudn't work with the outout. This is the correct way if demo'ing on a single video right?

ChristianIngwersen commented 3 years ago

Screenshot from 2021-11-22 13-35-56 Still the same issue with Pytorch 1.8.1

syguan96 commented 3 years ago

It's really a strange problem. Could you please install the PyTorch with conda again?

In my environment, the code can be passed. image

ChristianIngwersen commented 3 years ago

It's really a strange problem. Could you please install the PyTorch with conda again?

In my environment, the code can be passed. image

Whats the structure of your InternetData_ROOT?

And yes will create a new env in a moment

syguan96 commented 3 years ago

the structure is

|----`InternetData_ROOT`
|    |----seq01.json
|    |----seq01.npz
|    |----images
|    |    |----seq01
|    |    |    |----000001.png
|    |    |    |----000002.png
|    |    |    |----...
ChristianIngwersen commented 3 years ago

Solved after reinstalling with conda instead of pip. I've updated the README.md with the solution and created a PR.

syguan96 commented 3 years ago

I'm glad to hear this news. Thanks for your contribution to improving the quality of this repo!