Closed liveseongho closed 3 years ago
@liveseongho Thanks for pointing out the issue and sorry about the inconvenience.
I will push a fix soon.
Let me know if you running into other issues. We have not got a chance to test the full pre-training after code refactor.
@linjieli222
I got error messages as follows:
[1,0]<stderr>: File "pretrain.py", line 618, in <module>
[1,0]<stderr>: main(args)
[1,0]<stderr>: File "pretrain.py", line 243, in main
[1,0]<stderr>: save_training_meta(opts)
[1,0]<stderr>: File "/src/utils/save.py", line 23, in save_training_meta
[1,0]<stderr>: if args.rank > 0:
[1,0]<stderr>:AttributeError: 'Namespace' object has no attribute 'rank'
I printed args
Namespace(betas=[0.9, 0.98], checkpoint='/pretrain/pretrain-tv-init.bin', compressed_db=False, drop_svmr_prob=0.8, dropout=0.1, fp16=True, grad_norm=1.0, gradient_accumulation_steps=2, hard_neg_weights=[10], hard_negtiave_start_step=[20000], hard_pool_size=[20], img_db='/video', learning_rate=3e-05, load_partial_pretrained=True, lr_mul=1.0, lw_neg_ctx=8.0, lw_neg_q=8.0, lw_st_ed=0.01, margin=0.1, mask_prob=0.15, max_clip_len=100, max_txt_len=60, model_config='config/hero_pretrain.json', n_gpu=8, n_workers=1, num_train_steps=1650000, optim='adamw', output_dir='pt-temp', pin_mem=True, ranking_loss_type='hinge', save_steps=500, seed=77, skip_layer_loading=True, sub_ctx_len=0, targets=[{'name': 'tv', 'sub_txt_db': 'tv_subtitles.db', 'vfeat_db': 'tv', 'vfeat_interval': 1.5, 'splits': [{'name': 'all', 'tasks': ['mlm', 'mfm-nce', 'fom', 'vsm'], 'train_idx': 'pretrain_splits/tv_train.json', 'val_idx': 'pretrain_splits/tv_val.json', 'ratio': [2, 2, 1, 2]}]}, {'name': 'ht100_full_filtered', 'sub_txt_db': 'howto100m_pretrain_all_60s_clip_sub.db', 'vfeat_db': 'howto100m_pretrain_all_60s_clips', 'vfeat_shards': ['howto100m_pretrain_all_clips_8', 'howto100m_pretrain_all_clips_0', 'howto100m_pretrain_all_clips_1', 'howto100m_pretrain_all_clips_2', 'howto100m_pretrain_all_clips_3', 'howto100m_pretrain_all_clips_4', 'howto100m_pretrain_all_clips_5', 'howto100m_pretrain_all_clips_6', 'howto100m_pretrain_all_clips_7', 'howto100m_pretrain_all_clips_9'], 'vfeat_interval': 2.0, 'splits': [{'name': 'all', 'tasks': ['mfm-nce', 'fom'], 'train_idx': ['howto100_full_pretrain_split/ht100_full_filtered_train_8.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_0.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_1.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_2.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_3.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_4.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_5.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_6.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_7.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_9.json'], 'val_idx': 'howto100_full_pretrain_split/ht100_full_filtered_val.json', 'ratio': [2, 1]}, {'name': 'has-sub', 'tasks': ['mlm', 'vsm'], 'train_idx': ['howto100_full_pretrain_split/ht100_full_filtered_train_8.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_0.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_1.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_2.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_3.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_4.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_5.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_6.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_7.json', 'howto100_full_pretrain_split/ht100_full_filtered_train_9.json'], 'val_idx': 'howto100_full_pretrain_split/ht100_full_filtered_val.json', 'ratio': [2, 2]}]}], targets_ratio=[1, 9], train_batch_size=32, train_span_start_step=0, txt_db='/txt', use_all_neg=True, val_batch_size=32, valid_steps=5000, vfeat_interval=1.5, vfeat_version='resnet_slowfast', warmup_steps=10000, weight_decay=0.01)
How should I fix this issue?
@liveseongho
I have updated the utils/save.py to command out L23-24.
Please check if it works now.
It works.
Thanks!
Hi,
I'm trying to reproduce pretraining with config
pretrain-tv-ht-16gpu.json
I got error messages as follows:
So I printed that
opts
.I think
sub_txt_db
should beSubTokLmdb
, but it's not..? I'm not sure. How should I fix this issue?https://github.com/linjieli222/HERO/blob/00d8fbfada5f81062c43f0adfd0a570bf5814524/load_data.py#L36-L40
I can bypass this error message when I ignore L37-L39 and run L40.
Here is another issue https://github.com/linjieli222/HERO/blob/00d8fbfada5f81062c43f0adfd0a570bf5814524/pretrain.py#L50
should be modified to
f"{opts.img_db}/{target['vfeat_db']}/{shard}", sub_txt_db,
?