microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
https://arxiv.org/abs/2002.06353
MIT License
339 stars 54 forks source link

video only test for youcook #39

Closed mhyeonsoo closed 2 years ago

mhyeonsoo commented 2 years ago

Hi again,

Thanks for the supports. I am now trying to insert only video file as an test input for Youcook model. And I am having trouble modyfing getitem() funciton in dataloader.

Most issues are in the self._get_text(). It seems like I have to change the lines which are using data dictionary to default text input such as [CLS] or [SEP], but have no idea here.

Could you give me a few guides for modyfing to video only input? Thanks,

ArrowLuo commented 2 years ago

Hi @mhyeonsoo, I can not understand your question clearly. But If you just want to make the input text '[CLS] [SEP]', you can try to replace all the self.max_works in the _get_text by a local variable, e.g., max_length_, and make max_length_ always 2. In a case like the below,

def _get_text(self, video_id, sub_id):

        max_length_ = 2     #  local variable, which should be `self.max_words` as original function
        # replace all self.max_words with max_length_ below.

        data_dict = self.data_dict[video_id]
        k = 1
        r_ind = [sub_id]

        starts = np.zeros(k)
        ends = np.zeros(k)
        pairs_text = np.zeros((k, max_length_ ), dtype=np.long)
        pairs_mask = np.zeros((k, max_length_ ), dtype=np.long)
        pairs_segment = np.zeros((k, max_length_ ), dtype=np.long)
        pairs_masked_text = np.zeros((k, max_length_ ), dtype=np.long)
        pairs_token_labels = np.zeros((k, max_length_ ), dtype=np.long)

        ....

Best~

mhyeonsoo commented 2 years ago

@ArrowLuo

Yeah, I guess I've understood with the previous issue case #16 . Thanks!