Open BrunoSader opened 2 years ago
Simply, apply softmax to generate probaility and then thresholding the value at the second dimension by 0.5 provides the binary prediction result.
For example,
logits = model(input)
probs = F.softmax(logits, dim=-1)
preds = probs[:, 1] > 0.5
Thank you. I have some more questions if you don't mind. The model I am using is the trained BaSSl 40epochs, here is how I load it.
cfg = init_hydra_config(mode="extract_shot")
apply_random_seed(cfg)
cfg = load_pretrained_config(cfg)
# init model
cfg, model = init_model(cfg)
# init trainer
cfg, trainer = init_trainer(cfg)
Is this right? And I don't understand what I am supposed to give it as an input. Do I just create a dataloader of tensors for each image in my movie? Thank you very much for your help 😄
For loading a BaSSL 40 epochs scene segmentation model in inference
, you need to convert load_pretrained_config
to load_finetuned_config
function, for example,
def load_finetuned_config(cfg):
ckpt_root = cfg.CKPT_PATH
load_from = cfg.LOAD_FROM
with open(os.path.join(ckpt_root, load_from, "config.json"), "r") as fopen:
finetuned_cfg = json.load(fopen)
finetuned_cfg = easydict.EasyDict(finetuned_cfg)
# override configuration of pre-trained model
cfg.MODEL = finetuned_cfg.MODEL
cfg.PRETRAINED_LOAD_FROM = finetuned_cfg.PRETRAINED_LOAD_FROM
cfg.TRAIN.USE_SINGLE_KEYFRAME = False
cfg.MODEL.contextual_relation_network.params.trn.pooling_method = "center"
# override neighbor size of an input sequence of shots
sampling = finetuned_cfg.LOSS.sampling_method.name
nsize = finetuned_cfg.LOSS.sampling_method.params[sampling]["neighbor_size"]
cfg.LOSS.sampling_method.params["sbd"]["neighbor_size"] = nsize
return cfg
Then, you also need to specify LOAD_FROM
option to tell the path of a finetuned model.
It may be same with EXPR_NAME
used during finetuning stage.
For an input, our algorithm works on top of shot
.
you first need to divide a movie into a series of shots and extract three key-frames for each shot (refer http://docs.movienet.site/movie-toolbox/tools/shot_detector).
Then, you need to feed three key-frames for each shot as input of the network.
Hello, I would also be interested in knowing more details on how to run the code for inference starting from a fine-tuned model, I tried using @JonghwanMun but couldn't come up with working code.
Is it correct to init the cfg this way, and would "finetune" be the correct mode here?
cfg = init_hydra_config(mode="finetune")
apply_random_seed(cfg)
cfg = load_finetuned_config(cfg)
About the data, I have two questions:
init_data_loader
util function be used for the input that trainer.predict()
expects to receive? model(data)
it seems that the expected shape is [64,3,7,7], is this the right behavior?Thank you
Hello, I see FinetuningWrapper.load_from_checkpoint in main_utils.py, but i cannot find the implementation of load_from_checkpoint in finetune_wrapper.py, I wonder how it works, thanks
@barry2025 load_from_checkpoint() is a function inherited from LightningModule of pytorch lightning; It initializes the parameters from the checkpoint given by checkpoint_path when constructing FinetuningWrapper instance. Please refer to pytorch lightning document for more details.
Thanks! I never used pytorch lightning before, I'll try.
Hello, I read that you might be working on a demo on how to predict on a single video. I was able to create my own dataloader and call trainer.predict() but the output is not binary (boundary or not boundary). Does this model support scene boundary prediction (if so could you detail what are the steps? I just need to understand how i can make it work) or is it only a shot encoding model?
Thank you very much