simonw / llm-gemini

LLM plugin to access Google's Gemini family of models
Apache License 2.0
123 stars 11 forks source link

Handle attachment files larger than 20MB #19

Open simonw opened 2 weeks ago

simonw commented 2 weeks ago

The Gemini API requires that files larger than a certain size (I think 20MB) be uploaded to their files API rather than passed as inline base64.

This may be a tit bit tricky to implement due to the need to remember the file ID for a specific upload - might call for an extra database table or maybe even a change to LLM core to support optional extra metadata for persisted attachment records.

simonw commented 1 week ago

This is tricky, it's actually needed when ALL attachments add up to 20MB:

Always use the File API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB.

I can still do that in the plugin, but I'll need to resolve attachment sizes in order to make that decision - ideally without loading them into memory first.

At least there are no extra costs to worry about:

The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but cannot be downloaded from the API. The File API is available at no cost in all regions where the Gemini API is available.

simonw commented 1 week ago

I think the trick here will be calculating the size of all attachments plus the prompt and system prompt, then sorting the attachments by size and uploading the largest one in a loop until the amount of content left has dropped below the 20MB threshold.

There's another consideration here: presumably there's a performance advantage to uploading even a small file just once if it's going to be used in a lot of different prompts. But how to decide when to do that?

One possibility: for small files that weren't previously treated as uploads, automatically upload them the second time they are referenced in a prompt within X hours - as a very rough heuristic for detecting that they might be used again in the future.

Could also provide a sub-command:

llm gemini upload file.png

This will hash the file content and upload the file, stashing a record in the attachments table which can then be used to detect the file has been previously uploaded and reuse its Gemini file ID later on.

This is a strong indicator that adding a mechanism for plugins to track extra data against attachments is going to be necessary - either with a JSON column or some kind of foreign key custom table.

Maybe this:

attachment_id key value
43 gemini-file 8f47c8e9-12d4-4b86-b6a3-65c8f32598bc
simonw commented 1 week ago

Or have the llm-gemini plugin create and migrate its own tables for this - which would set a good precedent for how other plugins could do this.

Need to consider the Python library case though where a SQLite logs database isn't necessarily guaranteed.

That case will be tricky, because the execute prompt method in this plugin needs access to persistent storage in order to check if an attachment has previously been uploaded or not.

simonw commented 1 week ago

Might need some kind of abstraction in LLM core for persistent storage, which will soon also need to be both sync and async capable.

it probably shouldn't be 100% reliant on SQLite either, since I want LLM as a library to be useful in other contexts, eg for people who are integrating it with PostgreSQL or even a system with NoSQL storage of some kind.

gerred commented 6 days ago

note too that files only last in this free files API store for a certain amount of time. it's free, and limited to 20GB in size for an entire project. so, it also has some call to be an ephemeral store as well that needs lookup every time.