nileshtrivedi / awesome-engineering

Curated ideas for worldclass engineering projects with mentoring from experienced practitioners in the industry
5 stars 0 forks source link

Reimplementing Top Performing Models for WebArena using Dspy #2

Open viig99 opened 5 months ago

viig99 commented 5 months ago

Task Description

WebArena is a standalone, self-hostable web environment designed for building autonomous agents. It creates websites from four popular categories with functionality and data mimicking their real-world equivalents. To emulate human problem-solving, WebArena also embeds tools and knowledge resources as independent websites. WebArena introduces a benchmark for interpreting high-level realistic natural language commands into concrete web-based interactions. Annotated programs are provided to programmatically validate the functional correctness of each task.

Example agent behaviors can be seen in the below video Watch Video

Example Benchmark tasks which agents need to handle:

Proposition: Next Steps

Mentorship Notes

I am a Applied ML Staff Engineer with over 13 years of experience in software engineering and machine learning. I love developing and optimizing complex systems and have a strong passion for mentoring and guiding others. Currently, I am working on the WebArena project and am eager to onboard like minded learners who want into dive into agentic flows. Would love to work alongside, mentor and guide folks to build to a workable demo.

BalajiAI commented 5 months ago

Hi @viig99, thank you so much for doing this. The problem looks very interesting. I come from a computer vision background, so haven't played much with llms. But I've a strong interest in RL and have wrote few blogs as well. That's why I'm interested in this actually. So I'll try to ramp up my knowledge of llms and everything which you've mentioned above, first. If everything goes well, we can work on the implementation part. Thanks!

viig99 commented 4 months ago

Update: re-wrote step in DSPy and working on tuning the agentic flow's on webarena examples.

Github url: https://github.com/viig99/step_dspy Discord: https://discord.gg/yMXn29JAK7