Closed JamilHanouneh closed 5 months ago
I @JamilHanouneh, can you send some info for me?
also, can you clarify what you mean by "lightning utilities 0.10.1"? what package does this reference? I know of lightning
and lightning-pose
but I'm not sure what the "utilities" references.
for the labeled frames there are 13735 and for config this is:
data:
image_orig_dims: height: 540 width: 720
image_resize_dims: height: 384 width: 512
data_dir: /home/nsquared6/Desktop/users/Jamil/Done
video_dir: /home/nsquared6/Desktop/users/Jamil/Done/videos
data_dir
csv_file: CollectedData.csv
downsample_factor: 2
num_keypoints: 21
keypoint_names:
columns_for_singleview_pca: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
training:
imgaug: dlc
train_batch_size: 16 # 16
val_batch_size: 32 # 32
test_batch_size: 32 # 32
train_prob: 0.95
val_prob: 0.05
train_prob
) used for trainingtrain_frames: 1
num_gpus: 1
num_workers: 4
early_stop_patience: 3
unfreezing_epoch: 20
min_epochs: 300 max_epochs: 300
log_every_n_steps: 10
check_val_every_n_epoch: 5
gpu_id: 0
rng_seed_data_pt: 0
rng_seed_model_pt: 0
lr_scheduler: multisteplr lr_scheduler_params: multisteplr: milestones: [150, 200, 250] gamma: 0.5
model:
losses_to_use: [pca_singleview, temporal]
backbone: resnet50_human_top_res
model_type: heatmap
heatmap_loss_type: mse
model_name: test # human_pose_experiment_1
dali: general: seed: 123456 # Keep the same for reproducibility, or change if you like
base: train: sequence_length: 32 # Good starting point, tweak later if needed predict: sequence_length: 96 # Adapt based on how you'll use the model for predictions
context: train: batch_size: 8 # Start smaller, adjust based on your GPU memory 8 predict: sequence_length: 96 # Adapt based on how you'll use the model for predictions
losses:
pca_multiview:
log_weight: 5.0
# predictions should lie within the low-d subspace spanned by these components
components_to_keep: 3
# absolute error (in pixels) below which pca loss is zeroed out; if null, an empirical
# epsilon is computed using the labeled data
epsilon: null
pca_singleview:
log_weight: 5.0
# predictions should lie within the low-d subspace spanned by components that describe this fraction of variance
components_to_keep: 0.99
# absolute error (in pixels) below which pca loss is zeroed out; if null, an empirical
# epsilon is computed using the labeled data
epsilon: null
temporal:
log_weight: 5.0
# for epsilon insensitive rectification
# (in pixels; diffs below this are not penalized)
epsilon: 20.0
# nan removal value.
# (in prob; heatmaps with max prob values are removed)
prob_threshold: 0.05
eval:
hydra_paths: [" "]
predict_vids_after_training: true
save_vids_after_training: true fiftyone:
dataset_name: test
# if you want to manually provide a different model name to be displayed in FiftyOne
model_display_names: ["test_model"]
# whether to launch the app from the script (True), or from ipython (and have finer control over the outputs)
launch_app_from_script: false
remote: true # for LAI, must be False
address: 127.0.0.1 # ip to launch the app on.
port: 5151 # port to launch the app on.
test_videos_directory: /home/nsquared6/Desktop/users/Jamil/Done/videos
saved_vid_preds_dir: /home/nsquared6/Desktop/users/Jamil/Done/saved_vid_preds_dir
confidence_thresh_for_vid: 0.90
video_file_to_plot: null
pred_csv_files_to_plot: [" "]
callbacks: anneal_weight: attr_name: total_unsupervised_importance # This is the attribute that the callback is modifying. In this case, it’s total_unsupervised_importance. init_val: 0.0 # This is the initial value of the attribute. Here, it’s set to 0.0. increase_factor: 0.01 # This is the factor by which the attribute’s value is increased at each step. Here, it’s 0.01. final_val: 1.0 # This is the final value that the attribute should reach. Here, it’s 1.0. freeze_until_epoch: 0 # This parameter specifies the number of epochs for which the attribute’s value should remain at its initial value. Here, it’s 0, which means the attribute’s value starts increasing from the very first epoch.
hydra: run: dir: outputs/${now:%Y-%m-%d}/${now:%H-%M-%S} sweep: dir: multirun/${now:%Y-%m-%d}/${now:%H-%M-%S} subdir: ${hydra.job.num}
oh wow you have a lot of labeled frames! yes very surprising you're not seeing good results. is it possible to share one or more screenshots of your images? I'm curious how much variability you have across the frames.
A couple suggestions:
resnet50_human_top_res
is for full human pose estimation, I'm guessing it doesn't perform so well on close-ups of hands. you might just try resnet50
, which is pretrained on imagenetare you using tensorboard to monitor training? would be useful to see the loss curves for your previous models as well as after making these changes, that's a good way to see if there is obvious lack of learning.
another comment on training: we typically train with <=1k frames, so the epoch numbers are tuned to that scale a bit. since you have so many frames one epoch is a lot of data. so you could also try changing the following (this is just based on intuition):
training
unfreeze_epoch: 1
min_epochs: 100
max_epochs: 100
check_val_every_n_epoch: 1
lr_scheduler_params:
multisteplr:
milestones: [40, 60, 80]
gamma: 0.5
I think it's best to get something workable with a fully-supervised model first, then we can think about the unsupervised losses.
Thanks, it works, and I could get better results, but I have two questions:
Can you explain why you changed these parameters (milestones, check_val_every_n_epoch, unfreeze_epoch)? I want to get even better results. Do you have any further suggestions?
Glad to hear! Did you end up changing the backbone and the resizing dims? I would be surprised if the trraining params alone led to much better results.
unfreeze_epoch: the network has two components - the backbone (usually a resnet-50) and the head (which maps the backbone features to actual per-keypoint heatmaps). The head is always randomly initialized. The backbone is usually initialized with pretrained weights. When training we freeze the backbone weights for some number of epochs, to let weights of the head learn something meaningful first. Then we unfreeze the backbone and let all the weights of the model train. It's an open question of what the right epoch to unfreeze the backbone is, but in your case you have so many labeled images that by the time you get through one epoch the head weights are probably already in a good enough place to unfreeze the backbone.
check_val_every_n_epoch: this won't affect model training, this is just how often the validation data is run through the model for logging. Since I suggested you decrease the overall number of epochs (due to the large size of the dataset) I figured you might as well log the validation loss every epoch instead of every 5.
milestones: we use the Adam optimizer for learning, which dynamically updates the step size for each weight during training, but there is still an overall learning rate that needs to be set. There are many schemes for setting the learning rate, or changing it over time, and the one that we use is just to periodically halve the learning rate. So with milestones=[40, 60, 80] you are halving the overall learning rate at each of those milestones, which allows the network to settle down in local minima easier. If the learning rate is too big you keep jumping over local minima; if it's too small you might get stuck in a bad local minimum and never escape.
As far as further suggestions, I'll wait until I hear back about backbone and resizing dims before making any other suggestions. BTW how long did it take to train? And what type of GPU are you using?
Hi, @themattinthehatt this issue is interesting. Can you give more details about the backbone in LP? like what kinds of backbone we can incorporate into LP model?
@Wulin-Tan we offer a decent number of backbone options, though we've only thoroughly explored resnet-50s. I updated the documention so you can see more of the available options.
@JamilHanouneh in looking up the refs for some of the other backbones I stumbled across this page showing a decent number of backbones that have been pretrained on hand data; it would be quite easy for you to update the backbone code in lightning pose in order to use one of these. You can see how we incorporate other backbones from this same set of MMPose models here.
Are you by any chance using one of these publicly available datasets?
@YitingChang @JamilHanouneh since you both are interested in tracking hands I figured I'd take some time next week to add one of the pretrained hand backbones into the repo; will update you here when that's done
@YitingChang @JamilHanouneh I just added a backbone pretrained on the OneHand10k dataset from MMPose.
To use this backbone, first run git pull
from inside your lightning-pose
repo to get the code updates. Then, in your config file, set the backbone to
model:
backbone: resnet50_human_hand
The first time you do this you'll see the weights being downloaded from MMPose, and then the model will train like normal. Please let me know how this works for you!
@themattinthehatt Thanks for adding the new backbone! I'll give it a try and update you on the results.
By the way, regarding your previous questions:
Did you end up changing the backbone and the resizing dimensions? Yes, I did. I found that adjusting the resizing dimensions had a more significant impact. How long did it take to train? It took about 19 hours. What type of GPU are you using? I'm using an RTX 4090 paired with a Ryzen 9 processor. I'll keep you posted on how the new backbone performs!
I attempted to train a model on a hand-tracking dataset. I experimented with all available backbones, loss functions, and other hyperparameters, but the results were consistently poor. The output videos showed no detectable hand movements, even though my dataset is intact and free from issues. The model failed to produce meaningful results. Could you identify potential issues or suggest improvements to achieve better performance? As a side note, I had better results when I used the version of lightning utilities 0.10.1