showlab / UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
https://arxiv.org/abs/2307.16715
MIT License
324 stars 29 forks source link

Installing requirements #2

Closed jjihwann closed 1 year ago

jjihwann commented 1 year ago

Hi! Thanks for your nice research and codes.

I'm trying to set the environments to perform your code, but it doesn't work, because of many requirements that end with @ file:///~~

My terminal said: ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/croot/aiohttp_1670009560265/work'

How can I fix it? +++ In addition, similar with the cause of first issue,

I changed importing area of video_extractor.py to

import pdb
import torch as th   
import math   
import numpy as np  
import torch  
from run_on_video.video_loader import VideoLoader  
from torch.utils.data import DataLoader  
import argparse  
from run_on_video.preprocessing import Preprocessing  
import torch.nn.functional as F  
from tqdm import tqdm  
import os  
import sys  
from run_on_video import clip  
import argparse
QinghongLin commented 1 year ago

Hi @jjihwann

Thanks for your interest! I have updated the requirements as well as related codes for demo, can you re-try it and let me know? Thanks!

jjihwann commented 1 year ago

Actually, I thought that your requirements can be reduced,

so I used

conda create -n univtg python==3.8.1

pip install torch==1.12.1 gradio numpy==1.24.2 ffmpeg-python==0.2.0 \
torchvision==0.13.1 ftfy==6.1.1 regex==2022.10.31 tabulate==0.9.0 \
scipy==1.10.0 

and it seems working well now.

Also, I don't know why but --resume ./results/omni/model_best.ckpt does not work, so I modify the default value of config.py, 96L

Moreover,

the code occurs an error if

  1. there are no ./videos folder
  2. ./videos/youtube.mp4 is empty(any mp4 file should exist before the first trial)
  3. there are no ./tmp folder

Also, it seems that txt2clip() function in video_extractor.py should contain the following code before return.

np.savez(os.path.join("./tmp", 'txt.npz'), features=text_feature) Sorry for poor grammer 😂

QinghongLin commented 1 year ago

awesome! thanks for your detailed suggestion. I have updated the requirements, and updated a video in current repo, as well as tmp;

For the following codes, sorry for bugs, actually i have updated it in yesterday, which should be able to run smoothly. you can compare them and i suggest you replace your current codes by my updated ones.

Please let me know if you have any issues, and whether you can run successfully.

jjihwann commented 1 year ago

Excellent! I think it works well now!

But I still have a small question,,

What is the main difference between "foreground" and "saliency"?

In your paper, I understood that "foreground" is discrete value and "saliency" is continuous, but real values in code were both continuous.

I felt that they have similar roles. Could you explain more for me?

QinghongLin commented 1 year ago

Hi, @jjihwann , You propose a good question. Yes, foreground and saliency head both predict continuous score when inference. but for training, they are supported by different type supervision e.g., binary classification and contrastive learning. We introduce them so that we can integrate both supervision to improve the model. this fashion is very similar to image-text matching and image-text contrastive learning in vision-language pretraining tasks. e.g., https://arxiv.org/abs/2201.12086, https://arxiv.org/pdf/2107.07651.pdf.

But in practical, you can use foreground head or saliency head flexibly, or ensemble their scores together to get more stable prediction. one difference is that saliency score is more flexible for computation i.e., you can calculate the saliency score between any video and any text by dot product operation, but for foreground prediction, you need to input a pair of video-text to get the corresponding score.

Hope my response can answer your question!

jjihwann commented 1 year ago

Thanks for your kind explanation, I totally understood.

Thank you!

QinghongLin commented 1 year ago

close since resolve the issues.