https://huggingface.co/microsoft/Florence-2-large
All four models, initial support for all output types
TODO: