kakaobrain / bassl

Apache License 2.0
118 stars 17 forks source link

Explanation for scene boundary prediction #5

Open BrunoSader opened 2 years ago

BrunoSader commented 2 years ago

Hello, I read that you might be working on a demo on how to predict on a single video. I was able to create my own dataloader and call trainer.predict() but the output is not binary (boundary or not boundary). Does this model support scene boundary prediction (if so could you detail what are the steps? I just need to understand how i can make it work) or is it only a shot encoding model?

Thank you very much

JonghwanMun commented 2 years ago

Simply, apply softmax to generate probaility and then thresholding the value at the second dimension by 0.5 provides the binary prediction result.

For example,

logits = model(input)
probs = F.softmax(logits, dim=-1)
preds = probs[:, 1] > 0.5
BrunoSader commented 2 years ago

Thank you. I have some more questions if you don't mind. The model I am using is the trained BaSSl 40epochs, here is how I load it.

    cfg = init_hydra_config(mode="extract_shot")
    apply_random_seed(cfg)
    cfg = load_pretrained_config(cfg)

    # init model
    cfg, model = init_model(cfg)

    # init trainer
    cfg, trainer = init_trainer(cfg)

Is this right? And I don't understand what I am supposed to give it as an input. Do I just create a dataloader of tensors for each image in my movie? Thank you very much for your help 😄

JonghwanMun commented 2 years ago

For loading a BaSSL 40 epochs scene segmentation model in inference, you need to convert load_pretrained_config to load_finetuned_config function, for example,

def load_finetuned_config(cfg):
     ckpt_root = cfg.CKPT_PATH
     load_from = cfg.LOAD_FROM

     with open(os.path.join(ckpt_root, load_from, "config.json"), "r") as fopen:
         finetuned_cfg = json.load(fopen)
         finetuned_cfg = easydict.EasyDict(finetuned_cfg)

     # override configuration of pre-trained model
     cfg.MODEL = finetuned_cfg.MODEL
     cfg.PRETRAINED_LOAD_FROM = finetuned_cfg.PRETRAINED_LOAD_FROM

     cfg.TRAIN.USE_SINGLE_KEYFRAME = False
     cfg.MODEL.contextual_relation_network.params.trn.pooling_method = "center"

     # override neighbor size of an input sequence of shots
     sampling = finetuned_cfg.LOSS.sampling_method.name
     nsize = finetuned_cfg.LOSS.sampling_method.params[sampling]["neighbor_size"]
     cfg.LOSS.sampling_method.params["sbd"]["neighbor_size"] = nsize

     return cfg

Then, you also need to specify LOAD_FROM option to tell the path of a finetuned model. It may be same with EXPR_NAME used during finetuning stage.

For an input, our algorithm works on top of shot. you first need to divide a movie into a series of shots and extract three key-frames for each shot (refer http://docs.movienet.site/movie-toolbox/tools/shot_detector). Then, you need to feed three key-frames for each shot as input of the network.

LFavano commented 2 years ago

Hello, I would also be interested in knowing more details on how to run the code for inference starting from a fine-tuned model, I tried using @JonghwanMun but couldn't come up with working code.

Is it correct to init the cfg this way, and would "finetune" be the correct mode here?

cfg = init_hydra_config(mode="finetune")
apply_random_seed(cfg)
cfg = load_finetuned_config(cfg)

About the data, I have two questions:

Thank you

barry2025 commented 2 years ago

Hello, I see FinetuningWrapper.load_from_checkpoint in main_utils.py, but i cannot find the implementation of load_from_checkpoint in finetune_wrapper.py, I wonder how it works, thanks

JonghwanMun commented 2 years ago

@barry2025 load_from_checkpoint() is a function inherited from LightningModule of pytorch lightning; It initializes the parameters from the checkpoint given by checkpoint_path when constructing FinetuningWrapper instance. Please refer to pytorch lightning document for more details.

barry2025 commented 2 years ago

Thanks! I never used pytorch lightning before, I'll try.