moodymudskipper / ask

ask R anything
Other
72 stars 2 forks source link

Add context_plot()? #58

Closed samterfa closed 1 week ago

samterfa commented 2 weeks ago

Thanks for this phenomenal package. I thought context_plot() would be pretty cool to add where you could ask a question of a plot you just generated or how to improve it. This could be done using rstudioapi::savePlotAsImage along with the api call I imagine.

moodymudskipper commented 1 week ago

Thanks for your interest @samterfa !

Things have to be turned into natural language (in the wide sense) to be fed to models, so from a plot image, or image in general, that'd be challenging but maybe there are some openAI models that do these kind of things through the API. I know through the chatGPT UI we can ask for a variant of a picture but it doesn't work all that great IIRC. Still I'd like to integrate image related features.

For your case (tell me if I understood it wrong) we might do a bit better though:

library(ask)
library(ggplot2)
p <- ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
  geom_point()

# copy the code to your clipboard
ask("I would like one color per species, and have a subplot per species in a line", context_clipboard())

# or this way, this uses constructive and we lose the info about the name `iris` but the result is still helpful
ask("I would like one color per species, and have a subplot per species in a line", context_objects(list(p=p)))

With context_clipboard() I get some explanations and this correct code:

library(ggplot2)

p <- ggplot(iris, aes(Sepal.Width, Sepal.Length, color = Species)) +
  geom_point() +
  facet_wrap(~ Species, nrow = 1)

# Print the plot
print(p)
samterfa commented 1 week ago

The use-case I'm thinking of would be to plot some data and send the plot image and question "How might I fit the trend I'm seeing?".

OpenAI has vision-capable models that can process images along with text using the api. I was working on a proof of concept RStudio agent that was given a task, could suggest code to run, the code would run, a screenshot of the results would be sent back to the llm, it could suggest more code, all in a loop automatically (a little scary admittedly). This was helpful because the llm could just correct its own errors without me having to send them back to it.

example code

Anyway, the capability exists to send images to some of the models so if it seemed like a good idea, I'd be happy to help work on it.

moodymudskipper commented 1 week ago

Wow that's great! I like your idea too.

Let me copy your code sample for convenience:

    messages <-
      list(
        list(
          role = 'system',
          content = 'You are an R code generator. The user will run your code in an RStudio session and return to you a screenshot showing you the results of running your code. Only return R code.'
        ),
        list(
          role = 'user',
          content = list(
            list(
              type = 'text',
              text = initial_task
            ),
            list(
              type = 'image_url',
              image_url = list(
                url = glue::glue("data:image/jpeg;base64,{img}")
              )
            )
          )
        )
      )

So what we can do is join an image as part of a user message. I never did this so I'd need to play around, but let's assume it "just works" with gpt-4o.

On one hand we should have a new input image in every ask*() or follow_up*() function, where image is a path or url.

On the other hand system messages can't contain images AFAICT (by asking chatgpt) and context_*() functions for the moment are designed to create system messages (big strings). So I don't think context_plot() is the way to go.

What about :

ask("How might I fit the trend I'm seeing?", image = capture_plot())

capture_plot() takes either a ggplot2 object or takes the last plot, a bit like ggsave() but saves to a temp file.

What do you think ?

I think I'd rather take care of the implementation because I still have a bit of mess to push and cleanup, but if you can stay around for feedback, design ideas and tests that would be great.

We also need a ask_image() function to return image output, when the model is appropriate, but that's tangential.

moodymudskipper commented 1 week ago

How does this look ?

Screenshot 2024-09-30 at 23 16 11
samterfa commented 1 week ago

That looks pretty cool. Thanks for implementing!

I'll make one more plug for the original term if you ever refactor context_* functions to generate user messages instead of system ones. To me, ask('Explain this code to me.', context_clipboard()) is the same concept as ask('Explain this graph to me.', context_plot()). It could definitely get tricky if there were multiple contexts such as the code that generates the plot and the plot itself, so context might have to become a list at that point.

Anyway, thanks, great project!

moodymudskipper commented 1 week ago

Thanks! You can combine contexts with the context() function. I should mention this in the README. I understand your point, especially since I want this to be an intuitive package, the user shouldn't worry much about technicalities. I will give it some thought but meanwhile let's see how this implementation works out in practice.

moodymudskipper commented 5 days ago

An argument in your sense, contextualising a pdf (or maybe a webpage with pictures or an excel file with plots or images) might mean extracting the text AND the pictures, doing this with a single function and argument makes sense.