invictus717 / MiCo

Explore the Limits of Omni-modal Pretraining at Scale
https://invictus717.github.io/MiCo/
Apache License 2.0
63 stars 3 forks source link

test result #6

Open yun189 opened 4 days ago

yun189 commented 4 days ago

run inference_demo.py,print:['a man riding skis down a snow covered slope. a man is speaking with background noise and breathing sounds.'],which describe example/test.jpg. Is it right?


load_from_pretrained: ./MiCo-g/ckpt/model_step_319989.pt Please 'pip install xformers' Please 'pip install xformers' Please 'pip install xformers' WARNING:model.bert:If you want to use BertForMaskedLM make sure config.is_decoder=False for bi-directional self-attention. Unexpected keys [] missing_keys [] /home/liran/miniforge3/envs/MiCo_py39/lib/python3.9/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( tensor([[0.1206], [0.0043]], device='cuda:0', grad_fn=) tensor([0.7154, 0.0451], device='cuda:0', grad_fn=) /home/liran/miniforge3/envs/MiCo_py39/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") ['a man riding skis down a snow covered slope. a man is speaking with background noise and breathing sounds.']

yun189 commented 4 days ago

can I do other experiment ,such as cross-modal abilities, omni-cross abilities?

invictus717 commented 4 days ago

run inference_demo.py,print:['a man riding skis down a snow covered slope. a man is speaking with background noise and breathing sounds.'],which describe example/test.jpg. Is it right?

load_from_pretrained: ./MiCo-g/ckpt/model_step_319989.pt Please 'pip install xformers' Please 'pip install xformers' Please 'pip install xformers' WARNING:model.bert:If you want to use BertForMaskedLM make sure config.is_decoder=False for bi-directional self-attention. Unexpected keys [] missing_keys [] /home/liran/miniforge3/envs/MiCo_py39/lib/python3.9/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( tensor([[0.1206], [0.0043]], device='cuda:0', grad_fn=) tensor([0.7154, 0.0451], device='cuda:0', grad_fn=) /home/liran/miniforge3/envs/MiCo_py39/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") ['a man riding skis down a snow covered slope. a man is speaking with background noise and breathing sounds.']

Yes, you can simply refer to the image, audio, and video. The generated caption is precise.

invictus717 commented 4 days ago

can I do other experiment ,such as cross-modal abilities, omni-cross abilities?

Exactly, you can use it for any experiments you want, since it has been well-pretrained.

yun189 commented 4 days ago

when testing other tasks,i met many problems…… will you make details public?

invictus717 commented 4 days ago

So, how can I know what tasks you require to use pretrained models? I do not know your problems at all.

yun189 commented 3 days ago

such as estimate depth , i donot know how to process input data

yun189 commented 2 days ago

when i want to test depth task, image_input [1,1,3,224,224] ,output=model.forward_depth_encoder(image_input) ,output.torch.size='[257,1408]',what does it means? how can i got depth image? i donot know now.