microsoft / VideoX

VideoX: a collection of video cross-modal models
Other
967 stars 160 forks source link

The minimum GPU requirements to run the X-CLIP test video #96

Closed EircYangQiXin closed 1 year ago

EircYangQiXin commented 1 year ago

Can you tell me what the minimum GPU requirements are to run the X-CLIP test video? sorry,My English is poor.

nbl97 commented 1 year ago

Thanks for your interest. The CUDA memory depends on some factors, the model size, the number of frames, the batch size, etc. If you don't have much GPU memory, you can try to reduce these hyperparameters.

EircYangQiXin commented 1 year ago

Thanks for your interest. The CUDA memory depends on some factors, the model size, the number of frames, the batch size, etc. If you don't have much GPU memory, you can try to reduce these hyperparameters.

I understand. Can I use a pre-trained model to directly analyze an image into text? Sorry, I really don't understand machine learning, but I'm very interested in it. It looks like I'm going to spend a lot of time learning knowledge in this field.

This is the content translated by AI. Please forgive me if there is any offense.

EircYangQiXin commented 1 year ago

My GPU is P100 PCIE 16G x 2, can it run this program on the lowest configuration?

nbl97 commented 1 year ago

您好,我没太明白“analyze an image into text”是什么意思。X-CLIP是用来完成视频识别任务的,其中每个视频被表示为若干帧。视频编码器编码视频输入,文本编码器编码文本输入,通过在同一个语义空间中计算相似度,为视频选择最相似的文本描述,完成分类任务。 关于机器,如果只是使用预训练的模型做测试/推理,2 x P100 16G 是没问题的,你可以根据显存选择合适的batch size的大小,避免显存不足。batch size表示一次同时处理几个视频,在测试时,不同的batch size大小往往不会影响结果。

EircYangQiXin commented 1 year ago

是我的描述有问题,不好意思,只要能在我的测试服务器上面能运行测试和推理就行了,感谢回复!