Open jun0wanan opened 1 month ago
I have another question. I noticed that the demo uses the class LiveInfer:. How is this class different from the one used before? Why was it separated into its own class?😊
Hope to your reply , thank you!
I have another question. I noticed that the demo uses the class LiveInfer:. How is this class different from the one used before? Why was it separated into its own class?😊
Hope to your reply , thank you!
Hi, this is just used during inference, more compatible with frame-by-frame streaming inference. Instead, the training and evaluation are forward in parallel.
Hi, code: ` def joint_embed(
self, input_ids: torch.Tensor = None, frames: torch.Tensor = None, ): if frames is None: return self.get_input_embeddings()(input_ids) if input_ids is None: return self.visual_embed(frames) inputs_embeds = self.get_input_embeddings()(input_ids.clamp(max=self.vocab_size-1)) v_mask = input_ids == self.config.v_placeholder_id if v_mask.any(): inputs_embeds[v_mask] = self.visual_embed(frames) return inputs_embeds
`
I found that when I run the evaluate.py code separately, it causes the frame to be None, which leads to entering the first if condition. I want to ask if this is correct? Should it not enter this condition?
Process: I directly ran evaluate.py using the model you provided, and I just wanted to check the metrics :)
Hope to your reply , thank you!
Could you give the script you run? It seems that the frames are not properly passed.
Hi, code: ` def joint_embed(
`
I found that when I run the evaluate.py code separately, it causes the frame to be None, which leads to entering the first if condition. I want to ask if this is correct? Should it not enter this condition?
Process: I directly ran evaluate.py using the model you provided, and I just wanted to check the metrics :)
Hope to your reply , thank you!