open-mmlab / mmskeleton

A OpenMMLAB toolbox for human pose estimation, skeleton-based action recognition, and action synthesis.
Apache License 2.0
2.92k stars 1.03k forks source link

How to load a custom skeleton-based dataset, then train & test it #316

Open japfeifer opened 4 years ago

japfeifer commented 4 years ago

I'm attempting to load in a custom skeleton-based dataset. The custom dataset just has x,y,z coordinate data (there is no RGB video data so no video processing is required with the openpose software). I have been trying to follow the instructions here, but am running into some issues.

I have several questions related to the load, and then a few about training the model:

Question 1. I am re-formatting the custom dataset to the *.json file format. Since my data has no associated video, what do I set the "video_name": and "resolution": values to? Do I set them to null, or remove them completely?

Question 2. In the *.json file format example, there is an entry for "version": "1.0". Do I just put the same in my custom dataset *.json files? What is this version number for?

Question 3. I'm assuming that I do not need to create a yaml file similar to configs/utils/build_dataset_example.yaml , since it simply creates the *.json files (which I will do separately with my custom dataset). Is this correct?

Question 4. I noticed files (train_data.npy val_data.npy train_label.pkl val_label.pkl) in, for example, the data/Kinetics/kinetics-skeleton directory. Do these files get generated automatically when I train the model train.yaml ? Or, do I have to do some other step when loading in the custom dataset?

Question 5. What is the best sample train.yaml and test.yaml file to use as a base for creating the test/train yaml files for my custom dataset? Are the ones in mmskeleton/configs/recognition/st_gcn/dataset_example the best to start with?

Question 6. Assuming I use the train.yaml from Question 5, I have some questions about the various configurations. I'll limit my question to the dataset_cfg: section since that is where I assume that I have to do some customization. There are two - type: "datasets.DataPipeline" subsections. Is just one - type section enough? Also, I'm unsure what the num_track: , num_keypoints: and repeat: are for? and, for the pipeline: section, are those all related to RGB video data and hence I can omit them?

Thank you in advance for reviewing and answering!

japfeifer commented 4 years ago

Further to my questions above, I am able to create an experiment where I can load in a custom skeleton dataset and run the training recognition, but neither the loss value nor the validation accuracy improve from one epoch to the next (we only get 2% or so accuracy which is the same as just randomly choosing a label for each test sequence).

The custom skeleton dataset has:

The sample train.yaml file is here: train.txt

In the graph_cfg: section of the train.yaml file it appears as though we have to select a layout: . We have tried values 'ntu-rgb+d' as well as 'openpose', but those seem to assume that the skeleton has 25 or 18 joints, respectively. So we actually mapped our 10-joint custom skeleton to 25 joints, or 18 joints, so that we could use the 'ntu-rgb+d' or 'openpose' options and see what would happen.

Here is an example *.json file where we mapped our custom 10-joint skeleton to the 25-joint 'ntu-rgb+d' skeleton layout: train10.txt

And here is an example *.json file where we mapped our custom 10-joint skeleton to the 18-joint 'openpose' skeleton layout (we just extracted the x,y coordinate values and not the z coordinate values from our custom dataset since the 'openpose' skeleton layout is just 2-d): train24.txt

However, as mentioned above the neural network training does not improve on the validation accuracy as the epochs go by. Do we need to specify the layout: and if so does it have to be one of the pre-defined layout types? We were assuming that loading in a custom dataset meant that we could have different numbers/types of joints. Is this true and we are just using the mmskeleton software incorrectly?

Also, one other question. we had to comment out the following code for the train and test pipeline: - {type: "datasets.skeleton.normalize_by_resolution"} . This is because if you look in the sample jsonfiles we do not specify the resolution (since our dataset is just the x/y/z coordinate skeletal data and not the RGB vids). Is this ok to do? Does this have unintended consequences?

Thanks!

liqier commented 4 years ago

Why are the coordinates of the joint points less than 1, and what should be done with the coordinates?

liqier commented 4 years ago

image can you tell me what is "id"?

jgoldm commented 4 years ago

Why are the coordinates of the joint points less than 1, and what should be done with the coordinates?

Their coordinates are normalized (divided by the resolution), their resolution was set to null. I don't know if this is right, maybe you are supposed to specify a resolution, like [224, 224] and then use coordinates like [100, 150]. Does anyone know the correct way to build and train a custom skeleton-based dataset? The instructions were unclear to me.

jgoldm commented 4 years ago

@japfeifer Were you able to find out the answers to your layout and normalization questions? I have the same questions.

japfeifer commented 3 years ago

We were able to finally get things to run in the (older) st-gcn environment (but not the newer mm-skeleton environment). I've enclosed two matlab files that show how we mapped our custom skeleton to the NTURGBD format, as well as some files that st-gcn requires to run the data load and training. Hope this helps a bit. sample_code.zip

YoHo-O commented 3 years ago

Question 6:

  1. The two dataset_pipeline is related to the workflow: [['train', 5], ['val', 1]] , it means that the validation will be performed after every five training iterations during the training process. The first pipeline is to generate training samples, and the second pippeline is to generate validation samples. So you have to use two pipelines, or you can change the workflow config.
  2. _num_track is the people numbers (skeleton numbers) you set in every data frame; key_points is the keypoint numbers of the skeleton model you chosed; repeat_ is used to augument your dataset, the final training dataset length is len(your_training_datasamples)*repeat, that means every skeleton file will be load repeat times, noticed that simulate_camera_moving_ config in the first pipeline, it will generate random change in every data sample, so you will get len(your_training_data_samples)*repeat different training samples in the final.
ChalsonLee commented 3 years ago

Why are the coordinates of the joint points less than 1, and what should be done with the coordinates?

i have the same problem,have you solve it?