niuzaisheng / ScreenAgent

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
https://arxiv.org/abs/2402.07945
Other
278 stars 27 forks source link

Project collaboration #28

Open James4Ever0 opened 2 months ago

James4Ever0 commented 2 months ago

I have been working on a computer automating project below. The repo contains my thoughts and model architecture skeletons, even some working prototypes.

The link: https://github.com/james4ever0/agi_computer_control

If you don't mind, you can address my project into the README. If you are interested in my research you can check my notes.

You can also reach me with email.

James4Ever0 commented 2 months ago

Have you ever considered developing a terminal agent which send all kinds of keystrokes and combos just like human in a non-blocking fashion? I have made some progress around Cybergod that empowers you over this mission.

You can see the position of the cursor, the range of the selected text.

tmux_show_1

You can also capture a screenshot of the terminal with cursor denoted in red.

vim_edit_tmux_screenshot

Grayscale augmented terminal gives high contrast to the red cursor, making the agent easier to locate it.

grayscale_dark_tmux