princeton-nlp / intercode

[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
https://intercode-benchmark.github.io/
MIT License
194 stars 32 forks source link

LangChain? #2

Closed radman-x closed 1 year ago

radman-x commented 1 year ago

You have LangChain listed in the requirements (ie environment.yml) but I cannot find any use of it in the code. Do you use it anywhere? Also, is there anyway to integrate InterCode with LangChain? If so, please describe how you would do this.

john-b-yang commented 1 year ago

Hi @radman-x, thanks so much for your comment!

The langchain requirement in the environment.yml file is an artifact of some explorations we were doing in the earlier phases of the project to figure out how to deploy language agents on InterCode. Thanks for pointing this out - I will remove this dependency for now so as not to give readers the wrong impression.

We ended up not using langchain due to some limitations in the prompting schema. With that said, it should be possible to integrate InterCode with LangChain's custom tools pipeline. I think a rough sketch of this would look like:

from intercode.assets import bash_build_docker, bash_image_name
from intercode.envs import BashEnv

from langchain.tools import BaseTool

bash_build_docker()
env = BashEnv(image_name, traj_dir="logs/test/", verbose=True)

class InterCodeBashTool(BaseTool):
    name = "InterCodeBash"
    description = "A bash simulator that can be used to test bash commands..."

    def _run(self, action: str) -> str:
        obs, reward, done, info = env.step(action)
        if action == "submit":
            return reward
        return obs

# Create agent, do stuff with tool here...

To recreate the evaluation pipeline similar to those found in the experiments folder should be possible. You would just also have to provide a data_path argument to the BashEnv initialization (example here).

My personal experience is that I found defining custom prompting strategies required stitching together a lot of langchain boilerplate that I did not pursue in favor of the more concise form found in the experiments folder. With that said, I think it shouldn't be too difficult to recreate.

Hope this helps!

john-b-yang commented 1 year ago

Closing this for now, but feel free to re-open if anything comes up!

radman-x commented 1 year ago

Thanx for your fast reply. I do have one other related question. Another alternative integration point would be to convert InterCode agent into LangChain agent. This would take advantage of LangChain tools like web searching, running Python code, etc. and abstract away from OpenAI allowing any of LangChain's supported LLMs to be used. Do you think this makes any sense? If so, please describe how to implement it.

john-b-yang commented 1 year ago

Mm this is a cool idea! The concept of adding a Bash or SQL environment as a tool for a langchain agent that could then use such tools + other tools (i.e. browsing, calculator, vector database retrieval) is something we thought about briefly too, but have not explored just yet.

I think this would be possible by wrapping an initialized InterCode environment in an langchain BaseTool class, as presented in the above code snippet. The IntercodeEnv class along with both the BashEnv and SQLEnv environments do not rely on the openai PyPI package, so it shouldn't present any potential openai-based conflict with langchain. Only the evaluation in the experiments folder invokes openai functionality.

Regarding an implementation, I'm a bit busy these two days, but I'll give a shot at hacking together a runnable demonstration of how to use an InterCode env as a langchain tool this coming weekend and report back. I have reopened the issue for now to remind myself.

A quick aside to clear up potential confusion - the InterCode paper does not present an "agent" of any sort at the moment. Our evaluation just involves invoking the latest SOTA LLMs capable of text-to-code armed with a prompting strategy for interacting with the environment. Our baselines do not present any new modeling innovations or agents. Hopefully this was not misrepresented in the original paper or code? If a contradictory statement is made somewhere, I'll remove it.

radman-x commented 1 year ago

I have only read the project web page and (very) briefly scanned the code. That web page does often refer to an “agent” - that is what I was thinking could be a LangChain agent. I guess the paper (which I have not yet read) would explain more clearly how this works - probably that this is a generic agent, outside of the InterCode environment? So I have not come across any misrepresentation so far.

I look forward to any rough implementation you might have time to make.

radman-x commented 1 year ago

Another advantage of LangChain is that the agent using InterCode can use advanced plan and execute techniques like BabyAGI, AutoGPT, etc. for autonomous coding execution.

radman-x commented 1 year ago

Any updates on implementing the runnable demo?

radman-x commented 1 year ago

A suggestion if this implementation is too time consuming - maybe you could just write an API that InterCode exposes that mimics CodeBox in this repo:

https://github.com/shroominic/codeinterpreter-api

That would give it the awesome advantage of running LLM/LangChain generated code privately on a local device inside a Docker container in an InterCode environment instead of in the cloud.

What do you think?

john-b-yang commented 1 year ago

@radman-x Following up here. I'm not too familiar with the codeinterpreter-api. From a quick glance, it looks like it involves rewriting InterCode from the ground up according to a langchain abstraction, is this the right idea?

If that's the case, I think this would be quite time-consuming, unless the abstraction is really straightforward(?)

I'm happy to help with any ongoing effort to make this happen, but I'm unable to dedicate time to work on this solely.

I will close this issue for now, but please feel free to re-open if you or anyone would like to work on it, thanks!

jmanhype commented 1 year ago

why follow up here just to close it?

john-b-yang commented 1 year ago

I don't have time to work on something of this scale.

If you'd like to work on it, feel free to reopen and I'm happy to provide feedback or guidance as necessary.

If multiple additional people demonstrate they would appreciate this, I'll reconsider.

jmanhype commented 1 year ago

gotcha