Open viig99 opened 5 months ago
Hi @viig99, thank you so much for doing this. The problem looks very interesting. I come from a computer vision background, so haven't played much with llms. But I've a strong interest in RL and have wrote few blogs as well. That's why I'm interested in this actually. So I'll try to ramp up my knowledge of llms and everything which you've mentioned above, first. If everything goes well, we can work on the implementation part. Thanks!
Update: re-wrote step in DSPy and working on tuning the agentic flow's on webarena examples.
Github url: https://github.com/viig99/step_dspy Discord: https://discord.gg/yMXn29JAK7
Task Description
WebArena is a standalone, self-hostable web environment designed for building autonomous agents. It creates websites from four popular categories with functionality and data mimicking their real-world equivalents. To emulate human problem-solving, WebArena also embeds tools and knowledge resources as independent websites. WebArena introduces a benchmark for interpreting high-level realistic natural language commands into concrete web-based interactions. Annotated programs are provided to programmatically validate the functional correctness of each task.
Example agent behaviors can be seen in the below video
Example Benchmark tasks which agents need to handle:
Create a milestone for the upcoming task of adding a new branch for zsh comprehensive support starting on 5/1/2044 and ending in 20 days
Solution Proposition
Methods to try
We aim to redo the best-performing models on the WebArena leaderboard, either:
Tree Search for Language Model Agents
Implementation ideas
We will use Dspy, a library that helps modularize prompts into flows and optimize the flows further.
Proposition: Next Steps
Mentorship Notes
I am a Applied ML Staff Engineer with over 13 years of experience in software engineering and machine learning. I love developing and optimizing complex systems and have a strong passion for mentoring and guiding others. Currently, I am working on the WebArena project and am eager to onboard like minded learners who want into dive into agentic flows. Would love to work alongside, mentor and guide folks to build to a workable demo.