A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Adds in a new (fake) "task_type" unstructured, to allow the user to type whatever they want as a prompt
Creates a v2 block that remaps model_version to model_id so that the user can search their finetuned florence 2 models in app
tweaks list of required model files to avoid always redownloading florence2 models
adds no_repeat_ngram_size=0 to our transformers generation args
Why we set no_repeat_ngram_size=0:
I found a very sneaky bug destroying the ability of the model to generate valid json, where this parameter was set to 3. Having the parameter set to 3 means that once the model has seen a sequence of 3 tokens, it can never repeat that sequence.
Here's the generation for a receipt json before and after making this change:
Generation before:
Description
A few notable changes to florence models:
unstructured
, to allow the user to type whatever they want as a promptno_repeat_ngram_size=0
to our transformers generation argsWhy we set
no_repeat_ngram_size=0
:I found a very sneaky bug destroying the ability of the model to generate valid json, where this parameter was set to 3. Having the parameter set to 3 means that once the model has seen a sequence of 3 tokens, it can never repeat that sequence.
Here's the generation for a receipt json before and after making this change: Generation before:
Generation after:
Here's why -- here's the tokens the model generated after I fixed the parameter.
Notice the repeated sequence of tokens:
that the model was unable to generate before this change.
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
tested and deployed thouroughly on localhost
Any specific deployment considerations
no
Docs