microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
28.12k stars 4.11k forks source link

[Feature Request]: Web Surfer should choose what to add to context #1538

Open gee842 opened 5 months ago

gee842 commented 5 months ago

Is your feature request related to a problem? Please describe.

The problem is there is a lot of unwanted content within viewports. The information density of many pages is low due to the additional HTML widgets shown in many pages. Here is an example:

=======================
        - [DORA Cybersecurity Strategy for application security, ASPM and digital resiliency](/whitepapers-resources/whitepaper-dora/)
        - [Phoenix Security Vulnerability Priority](/whitepapers-resources/data-driven-vulnerability-management-are-sla-slo-dead/)
        - [Regulation in Application & Cloud Security, ISO, NIST, NIS2](/vulnerability-management-in-application-cloud-security/)
    + [![](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%2036%2037'%3E%3C/svg%3E)![](/media/Group-5542.png) FAQ](/faqs/)
    + [![](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%2036%2037'%3E%3C/svg%3E)![](/media/Group-5543.png) Follow the thread](/followthethread)
    + [![](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%2036%2037'%3E%3C/svg%3E)![](/media/Group-5544.png) Slack community ![](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%2036%2037'%3E%3C/svg%3E)![](/media/Group-5545.png)](https://join.slack.com/t/appsecphx-community/shared_invite/zt-1iw7awp0k-Bdb1r85U8mitcxFOMVPphw)
* [About Us](#)
    + [Our Mission](/about-us/)
    + [Our Advisors & Investors](/about-us/)
    + [Partners](/about-us/)
    + [Industry Recognition](/about-us/)
    + [Our Leadership Team](/about-us/)
    + [Careers](/about-us/)
    + [Contact us](/contact/)

Describe the solution you'd like

I propose a mechanism for the web_surfer agent to choose what to save to memory, possibly based on a custom prompt, so that it can more efficiently find the important parts of a page. Alternatively, a solution such as what browsers have as "Reader Mode" can be used to filter many cases that the main body of content can be identified.

Additional context

No response

gee842 commented 5 months ago

I came across this library which might be helpful: https://github.com/buriy/python-readability