xISSAx / Alpha-Co-Vision

A real-time video caption to conversation bot that captures frames generates captions and creates conversational responses using a Large Language Models base to create interactive video descriptions.
MIT License
118 stars 17 forks source link

[Issues]input types 'tensor<1x577x1xf16>' and 'tensor<1xf32>' are not broadcast compatible #1

Open pjq opened 1 year ago

pjq commented 1 year ago

When I run the script, the camera just flashed for one second, and I saw this error.

loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x577x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1]    34207 abort      python3 main.py
image
xISSAx commented 1 year ago

Heyy @pjq!!

Thank you for providing the information. The issue might be related to your PyTorch installation on their MacBook M1. Please follow the steps below to ensure proper GPU support for PyTorch on the MacBook M1:

Ensure you have Python 3.9.x and a supported version of the Conda package manager installed. You can download Miniforge from the following link recommended for M1 users: https://github.com/conda-forge/miniforge#miniforge3.

Or even better, follow the steps from Apple: https://developer.apple.com/metal/pytorch/ to ensure you have Pytorch running on M1 architecture correctly.

Create a new Conda environment for this project with Python 3.9.x by running the following command in your terminal:

conda create -n alpha-co-vision-env python=3.9

Activate the newly created environment:

conda activate alpha-co-vision-env

For more details and troubleshooting, please refer to the official PyTorch guide for installing on macOS M1 devices: https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/

Meanwhile, I'll be digging a bit deeper into this matter to figure out the root cause of the issue, and I will definitely get back to you asap! :)

Best, Varun

risetoday commented 1 year ago

dose use python 3.10 is the same effect ? I have tried to solve the problem using the above methods, but it still exists

pjq commented 1 year ago

@risetoday I am not using mps, so it works well You can check my PR

def init():
    global processor
    global model
    processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
    if config.settings.enable_mps:
        model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base", torch_dtype=torch.float16).to("mps")
    else:
        model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
image
skynbe commented 1 year ago

Same issue for me with python 3.9.12.