roykapon / MAS

The official implementation of the paper "MAS: Multiview Ancestral Sampling for 3D Motion Generation Using 2D Diffusion"
MIT License
97 stars 3 forks source link

how to get my own dataset #3

Closed GithubAccountTo closed 4 months ago

GithubAccountTo commented 6 months ago

After I get 2d human poses with AlphaPose, what do I do to get my own data set for training 2d diffusion model

roykapon commented 6 months ago

Hi @GithubAccountTo, thank you for checking out our code! We invested time and effort for making our code as modular as possible, but adding a new dataset still requires some work :) This is the general workflow:

  1. Convert your poses into numpy objects (using json.load and then np.save). For most use cases we advise to centralize the character (for instance by subtracting the pelvis position, or the average of all landmarks in each frame). We also applied some smoothing, scaling and filtering heuristics, which is highly recommended.
  2. Check out #2 for generating the Mean.npy and Std.npy files. _Note: In the following steps we would advise to follow the existing dataset implementation in data_loaders/nba/:_
  3. Create a directory of your own in data_loaders/{your_dataset_name} (or simply copy data_loaders/nba).
  4. Create a data_loaders/{your_dataset_name}/skeleton.py and define your skeleton landmarks (list of strings) and bone chains (a list of a list of strings). _See data_loaders/nba/skeleton.py for an example_
  5. Create a data_loaders/{your_dataset_name}/dataset.py file and create a dataset class that implements __len__(self) (returns the size of the dataset) and __getitem__(self, index) (returns the motion at the specified index). You can inherit from data_loaders/base_dataset.py if it fits your needs. _See data_loaders/nba/dataset.py for an example_
  6. Create a config.py file and configure your dataset. Here you will import the skeleton and dataset files. _See data_loaders/base_config.py for the general details or take data_loaders/nba/config.py as an example_.
  7. Add your config to the CONFIGS dictionary in data_loaders/dataset_utils.py.
  8. Add your dataset name to the choices of --dataset argument at utils/parser_utils.py (lines 36 and 84).

Now you should finally be able to use your dataset using the --dataset argument!!!

For your specific use case, it appears to me that you can simply copy the nba dataset implementation (instead of stages 3, 4, 5 and 6), with minor changes (adjust the skeleton, or adjust your data). Please note that the skeleton AlphaPose outputs is different from the skeleton we use (We removed the landmark at index 7)

Some general notes:

Wishing you the best of luck!!

GithubAccountTo commented 6 months ago

@roykapon thank you for your reply. Q1: In stage 1, all joints are subtracted from the position of the pelvis in order to centralize the character. I want to know the details of the next step, smoothing, scaling and filtering heuristics. I want to get the data llike nba, as shown in the figure below

image

Q2: Due to occlusion, there are some intermediate frames that do not track the human 2dpose, so can such data be used for training

AndreyKrotkikh commented 4 months ago

@roykapon hi, read the paper and check the repo. But still have questions:

roykapon commented 4 months ago

Hello @GithubAccountTo :) Forgive me for the late response.

As for Q1: I am attaching the code we used for the processing heuristics and a helper file with the skeleton information. You can Adjust it for your own needs. process_motions.zip

As for Q2: Absolutely! I would advise to override the handle_mask function in your dataset implementation (in data_loaders/<your_dataset>/dataset.py) with a threshold that fits your case:

image

What it does is masking the loss of the occluded features during training of the model, so it is only trained on the valid features.

roykapon commented 4 months ago

Hi @AndreyKrotkikh, we are delighted to see others elaborate on our project! The 3rd coordinate is indeed the confidence of each feature, but we used some heuristics to normalize it, so it is no longer in the range of 0 to 1. In your implementation you can use the original confidence values, but adjust your masking threshold accordingly. As for simultaneous videos, I am not sure I properly understand what you are meaning. If your question is do we need to train the model with videos that record the same motion instance from multiple angles then the answer is no. There is no need for multiview data, just simple videos of different motions. It is recommended however to have a large diversity of view angles in the videos so the model can learn motions from all angles.

Hope we managed to help :) If it did not answer your question, feel free to clarify your needs.

AndreyKrotkikh commented 4 months ago

Thank you very much, about simultaneous videos - your answer clarified all my concerns!