DOC: How langchain_experimental.rl_chain works

austinmw commented 11 months ago

Issue with current documentation:

Hi, can we get more documentation on langchain_experimental.rl_chain? I'm having trouble wrapping my head around how it works, and the documentation is sparse.

From the notebook intro, originally I thought it was going to tune the human written prompt template and then output a new and improved prompt template that it found was better. However it seems to be doing something else.

Idea or request for content:

No response

dosubot[bot] commented 11 months ago

🤖

Hi @austinmw,

It's great to see you back on our repository. Hope you've been doing well!

Sure, I can provide some additional information on the langchain_experimental.rl_chain module.

The langchain_experimental.rl_chain module in the LangChain framework is designed to implement reinforcement learning chains. It leverages the Vowpal Wabbit (VW) model as a learned policy for reinforcement learning.

The main class in this module is RLChain, which inherits from the Chain class. The RLChain class has several attributes, including:

llm_chain: Represents the underlying Language Model chain.
prompt: The template for the base prompt.
selection_scorer: Scorer for the selection. Can be set to None.
policy: The policy used by the chain to learn to populate a dynamic prompt.
auto_embed: Determines if embedding should be automatic. Default is False.
metrics: Tracker for metrics, can be set to None.

The RLChain class also has several methods, including:

predict: This method is used to make predictions.
learn: This method is used to learn from the event.
log: This method is used to log the event.
save: This method is used to save the model.
update_with_delayed_score: This method is used to update the learned policy with the score provided.
deactivate_selection_scorer: This method is used to deactivate the selection scorer.

Here is an example of how to initialize an RLChain:

rl_chain = RLChain(
    feature_embedder=feature_embedder,
    model_save_dir="./",
    reset_model=False,
    vw_cmd=None,
    policy=VwPolicy,
    vw_logs=None,
    metrics_step=-1,
    metrics_window_size=-1,
)

In this example, feature_embedder is an instance of the Embedder class, model_save_dir is the directory for saving the VW model, reset_model is a boolean that determines whether the model starts training from scratch, vw_cmd is a list of command line arguments for the VW model, policy is the policy used by the chain, vw_logs is the path for the VW logs, metrics_step is the step for the metrics tracker, and metrics_window_size is the window size for the metrics tracker.

You can find more details in the source code of the RLChain class in the libs/experimental/langchain_experimental/rl_chain/base.py file.

Regarding the predict method, it's not explicitly defined within the RLChain class. However, the RLChain class inherits from the Chain class, and if the Chain class or any other superclass of RLChain has a predict method, then RLChain would also have that method. In the VwPolicy class, which is used as the policy for the RLChain class, the predict method is defined. In this context, the predict method takes an event as input and returns a prediction from the Vowpal Wabbit model after parsing and formatting the event.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

libs/experimental/langchain_experimental/rl_chain/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

169 commented 10 months ago

@austinmw You can see this notebook

austinmw commented 10 months ago

@169 I mentioned in my post that I read that notebook already

langchain-ai / langchain