Open Ankush-lastmile opened 10 months ago
Note: This is the exact same thing as OutputDataWithStringValue
, so we should probalby just combine it to be DataWithStringValue
so it can be used as both input and output
There are now multiple model parser that support non-text inputs.
HuggingFace Image to Text, and Automatic Speech Recognition. Both RemoteInference and Transformers
Both of these model parsers support file path inputs. Binary Data inputs should be possible, but is a todo to be supported.
Steps from here:
Additional Stretch Goals would be adding support for GPT-4v & Gemini Vision (pro)
Similar to how aiconfig sdk supports typed output data, for ex:
OutputDataWithStringValue
in schema.py, we need to add support for a similar structure to the input.This can be as simple as something like
Once that is done, we should cleanup any existing one-off implementations of input data types. As of writing, the AutomaticSpeechRecognition Model Parser defines this ad hoc and enforces this type. Address the todos and clean up callsites and usages as well.
To clarify, it tries to load the input data and throws on incompatibility. With the introduction of types into schema, this validation will be done at
load()
time. Instead of loading, simply check for existence of input data.edit: ASR model parser does not use this