ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
Project collaboration #28

Open James4Ever0 opened 2 months ago

James4Ever0 commented 2 months ago

I have been working on a computer automating project below. The repo contains my thoughts and model architecture skeletons, even some working prototypes.

The link:

If you don't mind, you can address my project into the README. If you are interested in my research you can check my notes.

You can also reach me with email.

James4Ever0 commented 2 months ago

Have you ever considered developing a terminal agent which send all kinds of keystrokes and combos just like human in a non-blocking fashion? I have made some progress around Cybergod that empowers you over this mission.

Grayscale augmented terminal gives high contrast to the red cursor, making the agent easier to locate it.
