Open shcheklein opened 1 week ago
That's great idea!
It would help to identify an actual api - how to pass prompt, how to change ChatGPT to something else, etc
Also, ideally this functionality should be implemented outside of DC class while still use natively. Any idea on how to implement this?
I'm asking because we might have multiple connectors like llm and we cannot put everything in the DC class which is already fat class.
Another though - it can be implement using outlines
which seems has a decent support for multiple LLM models.
Beside Pydantic it has structured output for simple types which is nice. It would be great to un queries like "how many people in image" using visual models and getting results directly to table.
Yes, I saw outlines - but I was considering it more as a wrapper for open models (vs APIs like OpenAI, etc) ... but yes, if they have a full support and also full support for different types of data (images, texts, etc) - yep we can and should use something like that.
Also, ideally this functionality should be implemented outside of DC class while still use natively. Any idea on how to implement this?
Agreed on DC being too overloaded. In this case can be a function that we pass to gen
, map
as a start I guess.
When I was suggesting llm()
approach that was primarily for the mental exercise reason. Can we completely or to a certain degree reimagine a classical dataframe-like API considering that we have LLMs? Just thinking in that terms is useful I think. But overall, I agree, if there no clear benefits / strong ideas - then we should do it on the lib level.
Come up with higher level LLM UDF.
When analyzing data via LLMs (text, images), step by step we have quite a lot of repetitive code like:
Which is decent enough already, and not very complicated, but I wonder if we can make LLM maps/gens a first class citizen in the language: