Open RandomGitUser321 opened 1 week ago
Redid it to work with batched images:
Hey, thanks, you are right and this was on my to-do list. I do want the captions outputted as a list though with batches, so we can use it with nodes that understand string lists, for example:
I've pushed this now.
Awesome, good idea! And thanks for getting Florence-2 to work with Windows by the way. I spent hours trying to get it to work with just a plain python script and failed. I really should see how you bypassed the damn flash-attention because it's become a large problem lately with a lot of different things I've been messing with (Lumina was one that you also got working as well).
Oh my God, I am just loving this person named Kijai so much.
Awesome, good idea! And thanks for getting Florence-2 to work with Windows by the way. I spent hours trying to get it to work with just a plain python script and failed. I really should see how you bypassed the damn flash-attention because it's become a large problem lately with a lot of different things I've been messing with (Lumina was one that you also got working as well).
With Lumina it's pretty much just necessary, it's more than twice as fast as SDP attention there. For these LLMs it seems unnecessary and the problem was that it didn't even try to run it without flash_attn due to a bug in the original code, the bypass only fixes that.
Updated: Ensured it works with batches of images as well
Add this stuff to filter out the special tokens. This also makes sure that all the other functions still work as well, since they rely on these tokens for things. This way, you can get a clean output that can be saved or used as a prompt.
I tested all the other features and they all worked the same.
Also, at the very end:
Screenshots of the various modes working (only needed one caption type to show it working):
![image](https://github.com/kijai/ComfyUI-Florence2/assets/27916165/651d59b9-b04f-46d9-8e46-e9fc878cfc1c)
I went back and redid some of it once I learned that it was having issues with just doing batch captions. It should work with all the options now and with batching of images as well.