simonw / files-to-prompt

Concatenate a directory full of files into a single prompt for use with LLMs
Apache License 2.0
244 stars 17 forks source link

Add an nbconvert plugin to convert jupyter notebooks to markdown or text prior #13

Open cmungall opened 3 months ago

cmungall commented 3 months ago

I frequently want to show LLMs my notebooks as examples of working code. My favorite, claude-3-opus, seems to have no issue with the .ipynb format (but haven't rigorously investigated) but this can still be a waste of tokens especially if there are a lot of extraneous images and formatting info.

One option would be to use https://nbconvert.readthedocs.io/

to convert to a format like markdown https://nbconvert.readthedocs.io/en/latest/config_options.html#exporter-options

Which is presumably more digestible at least for smaller token length LLMs

This would probably be an extra/plugin to keep the core files-to-prompt lean

A future extension would be to also feed in any img links generated into multimodal LLMs https://github.com/simonw/llm/issues/331

fry69 commented 3 months ago

I added support for converting Jupyter Notebooks on-the-fly to my Bun/Typescript port. I'd love to hear if this is useful.

Update: I added a crazy fast (100x compared to nbconvert) internal parser to the script (use with --nbconvert internal). Tested with a medium sized folder of Notebook files and I could not spot obvious problems in the output. But please report any problems with this new feature.

Background: Turns out Jupyter Notebooks are just JSON and if Bun/Node are good at anything than it's munching through this. The complete, minified script is still ~6,1k bytes btw.

Some numbers:

Directory size:

$ du -sh ~/jupyter/
5,9M    /Users/fry/jupyter/

Convert with external nbconvert:

real    0m10,365s
user    0m6,246s
sys     0m0,836s

Convert with internal parser:

real    0m0,111s
user    0m0,067s
sys     0m0,029s