simonw / files-to-prompt

Concatenate a directory full of files into a single prompt for use with LLMs
Apache License 2.0
241 stars 18 forks source link

Pipe LLM response to files #14

Open eddie opened 2 months ago

eddie commented 2 months ago

I've been leveraging your llm, strip-tags, and ttok tools and this is the perfect addition! Thank you @simonw!

I have been toying with the reverse of this, where we can pipe typical LLM responses back to the files system with a quick confirmation step. Would this functionality belong in the files-to-prompt tool or in it's own utility, and I'm curious if anyone knows of a CLI tool that already does this?

Many thanks!

irthomasthomas commented 2 months ago

It is easy enough to send the output to a file with > and >>, but I guess you talking about something more intelligent, for instance, updating a function definition in a file? Or splitting one response into multiple files?

fry69 commented 2 months ago

@simonw 's llm tool already stores every response inside a SQLite database by default, in case that is not known. Of course you can use his datasette tool to dig into that database.

If you want to go really nuts with the responses from the LLM, have a look at e.g. LangChain or Instructor.

LangChain offers a broad spectrum to persist responses, from in-memory, simple files to several databases. See langchain_community and langchain-postgres.

Instructor is more lightweight and focuses on forcing LLMs to return data in a structured form (it generates JSON schemas on the fly and adds them to the prompt, leveraging tool calling if available), so they can easily get processed further. For TypeScript aficionados like me, there is also an official port instructor-js. This is so brilliant: JSON (TypesScript) + Prompt -> LLM -> JSON back.

eddie commented 2 months ago

Thanks for your input @irthomasthomas and @fry69. I was thinking a unix-like tool for parsing out stdout responses e.g:

Some LLM response...

file.cpp:
```cpp
#include <iostream>
..

into corresponding files on the filesystem. What would be a typical approach for this? I was initially experimenting with some cobbled together regex's. It looks like instructor etc is close but unsure how would approach in this case.

stoerr commented 1 month ago

@eddie : there is a command line tool chatgptextractcodeblock in my chatgpt tool suite that extracts the code block, though that's done in Javascript.

BTW, you guys: files-to-prompt is a wonderful light weight idea to give an LLM files input on the command line, especially if it's just a one shot action. Just mentioning - if you want to explore that way of generating files / doing changes systematically on many files or even in a build process, you might want to have a look at my AI based code generation pipeline (source on Github). That's a command line tool that started out from doing basically a files-to-prompt with writing the LLM response to a file, but has quite some more features now:

That also uses my "put it into the AI's mouth pattern" to encapsulate the files into several chat messages, instead of just putting into the prompt. That might limit prompt injection risks or other confusions.

Some of that might be implemented into LLM or plugins or partially into files-to-prompt by somebody, but that could possibly too much for such general tools.