khanrc / honeybee

Official implementation of project Honeybee (CVPR 2024)
Other
416 stars 18 forks source link

Non-squared number of visual tokens #16

Closed Gutianpei closed 5 months ago

Gutianpei commented 5 months ago

Nice work!

It looks like honeybee can only process input with squared number of visual tokens, have you tried padding the non-squared input shape? Or what preprocessing would you recommend for a non-squared input? For long token inputs (e.g. [B, 3096, hidden_size]) either padding or cropping doesn't sound optimal.

khanrc commented 5 months ago

To clarify, are you referring to the number of input visual tokens to the projector (= output visual tokens from the vision model, called "visual features" in the paper), or the number of input visual tokens to LLM (= output visual tokens from the projector)?

Gutianpei commented 5 months ago

To clarify, are you referring to the number of input visual tokens to the projector (= output visual tokens from the vision model, called "visual features" in the paper), or the number of input visual tokens to LLM (= output visual tokens from the projector)?

Thanks for the reply. I'm talking about the number of input visual tokens to the projector (C-abstractor). Since honeybee projector will make the visual features a square image, I'm curious if there is any experiments you've done for non-square visual features or and recommended preprocess (e.g. padding) for non-square input, since in most of the time the visual features are non-square.

khanrc commented 5 months ago

We have not tried such experiments, but logically C-abstractor can handle any 2d inputs including non-squared shapes. Since the current implementation assumes the squared shape, you only need to update re-shaping part (https://github.com/kakaobrain/honeybee/blob/main/honeybee/projectors.py#L108-L118).

Gutianpei commented 5 months ago

We have not tried such experiments, but logically C-abstractor can handle any 2d inputs including non-squared shapes. Since the current implementation assumes the squared shape, you only need to update re-shaping part (https://github.com/kakaobrain/honeybee/blob/main/honeybee/projectors.py#L108-L118).

Good point, thanks!