Could you please provide more explanation on how to pretrain PKG?

Thank you for your interest! Please refer to https://github.com/salesforce/paprika/blob/cbefd714f3368733b1dc4dc3f2ee1e2ba69f57ed/datasets/build_knowledge/build_knowledge.py#L4 for the code to build the PKG. Specifically:

'segment_wikistep_sim_scores_ready' indicates whether the similarity score between a video segment and a wikiHow step has been computed and saved on the disk. If it is not ready, the function get_sim_scores() will be called.
'nodes_formed' indicates whether the graph nodes of the PKG have been formed. As explained in the paper, graph nodes are created by clustering the wikiHow steps.
'edges_formed' indicates whether the graph edges of the PKG have been established. Once the function get_edges() is completed, the graph structure will be ready.

The subsequent functions prefixed with 'pseudolabel*' pertain to extracting different types of pseudo labels based on the constructed PKG.

To save computation time and avoid using GPUs during the PKG construction process, we utilize the S3D model to extract features in advance and save these features on the disk. Instructions for feature extraction using S3D can be found here: https://github.com/salesforce/paprika#feature-extraction.

salesforce / paprika

Could you please provide more explanation on how to pretrain PKG? #6