Closed copgdjj closed 5 months ago
Hi, thanks for the question. The --texts only works with open-vocabulary detectors. I see you are using YoloX, which is a standard coco detector. For that, detection based on text prompt is not support.
Thank you!
python demo/video_demo_with_text.py E:\masa-main\short.mp4 --out E:\masa-main\shoutput.mp4 --det_config projects/mmdet_configs/yolox/yolox_x_8xb8-300e_coco.py --det_checkpoint E:\masa-main\saved_models\pretrain_weights\yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth --masa_config configs/masa-gdino/masa_gdino_swinb_inference.py --masa_checkpoint E:\masa-main\saved_models\gdino_masa.pth --texts "girl" --score-thr 0.3 Loads checkpoint by local backend from path: E:\masa-main\saved_models\pretrain_weights\yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth Loads checkpoint by local backend from path: E:\masa-main\saved_models\gdino_masa.pth E:\masa-main\masa\apis\masa_inference.py:97: UserWarning: dataset_meta or class names are not saved in the checkpoint's meta data, use COCO classes by default. warnings.warn( E:\masa-main\masa\apis\masa_inference.py:108: UserWarning: palette does not exist, random is used by default. You can also set the palette to customize. warnings.warn( e:\masa-main.conda\Lib\site-packages\mmengine\visualization\visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the
main()
File "E:\masa-main\demo\video_demo_with_text.py", line 182, in main
track_result = inference_masa(masa_model, frame, frame_id=frame_idx,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\masa-main\masa\apis\masa_inference.py", line 259, in inference_masa
result = model.test_step(data)[0]
^^^^^^^^^^^^^^^^^^^^^
File "e:\masa-main.conda\Lib\site-packages\mmengine\model\base_model\base_model.py", line 145, in test_step
return self._run_forward(data, mode='predict') # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "e:\masa-main.conda\Lib\site-packages\mmengine\model\base_model\base_model.py", line 361, in _run_forward
results = self( data, mode=mode)
^^^^^^^^^^^^^^^^^^^^^^^
File "e:\masa-main.conda\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "e:\masa-main.conda\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "e:\masa-main.conda\Lib\site-packages\mmdet\models\mot\base.py", line 110, in forward
return self.predict(inputs, data_samples, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\masa-main\masa\models\mot\masa.py", line 347, in predict
img_data_sample = self.detector.predict(
^^^^^^^^^^^^^^^^^^^^^^
File "E:\masa-main\masa\models\detectors\gdino_masa.py", line 103, in predict
text_prompts.append(data_samples.text)
^^^^^^^^^^^^^^^^^
AttributeError: 'DetDataSample' object has no attribute 'text'
save_dir
argument. warnings.warn(f'Failed to add {vis_backend.class}, ' [ ] 0/799, elapsed: 0s, ETA:e:\masa-main.conda\Lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3527.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] Traceback (most recent call last): File "E:\masa-main\demo\video_demo_with_text.py", line 258, inbat:python demo/video_demo_with_text.py E:\masa-main\short.mp4 --out E:\masa-main\shoutput.mp4 --det_config projects/mmdet_configs/yolox/yolox_x_8xb8-300e_coco.py --det_checkpoint E:\masa-main\saved_models\pretrain_weights\yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth --masa_config configs/masa-gdino/masa_gdino_swinb_inference.py --masa_checkpoint E:\masa-main\saved_models\gdino_masa.pth --texts "girl" --score-thr 0.3