mnotgod96 / AppAgent

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
https://appagent-official.github.io/
MIT License
4.97k stars 538 forks source link

[macOS] suggestions for `get_screenshot` and `get_xml`, and issue with `traverse_tree` #2

Open cdxxotus opened 10 months ago

cdxxotus commented 10 months ago

Hi team, I am impressed by the features offered by your work. Congratulations.

After trying it on macOS 14.0 from a MacBook M2, I encountered two issues:

  1. I was unable to save and pull screenshots/xml with adb Capture d’écran 2023-12-22 à 07 24 53

    so I changed the functions to this (it bypass the error but I am not sure if it creates more issues than it solves).

def get_screenshot(self, prefix, save_dir):
        cap_command = f"adb -s {self.device} shell screencap -p > {os.path.join(save_dir, prefix + '.png').replace(self.backslash, '/')}"
        result = execute_adb(cap_command)
        if result != "ERROR":
            return os.path.join(save_dir, prefix + ".png")
        return result

    def get_xml(self, prefix, save_dir):
        dump_command = f"adb -s {self.device} shell uiautomator dump > {os.path.join(save_dir, prefix + '.xml')}"
        result = execute_adb(dump_command)
        if result != "ERROR":
            return os.path.join(save_dir, prefix + ".xml")
        return result
  1. traverse_tree fail with the following error:
    Traceback (most recent call last):
    File "/Users/danielfebrero/Dev/AppAgent/scripts/step_recorder.py", line 94, in <module>
    traverse_tree(xml_path, clickable_list, "clickable", True)
    File "/Users/danielfebrero/Dev/AppAgent/scripts/and_controller.py", line 58, in traverse_tree
    for event, elem in ET.iterparse(xml_path, ['start', 'end']):
    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 1249, in iterator
    yield from pullparser.read_events()
    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 1320, in read_events
    raise event
    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 1292, in feed
    self._parser.feed(data)
    xml.etree.ElementTree.ParseError: syntax error: line 1, column 0

Any help is appreciated, I am excited by this work. Thanks a lot.

mnotgod96 commented 10 months ago

Hi Daniel,

According to the error traceback, it seems that the XML parser did not successfully read the file. Can you double-check if the XML file is dumped in your device and the file is not empty? I have not experienced such an error during testing.

Jiaxuan

mnotgod96 commented 10 months ago

Also, try enabling the view attribute inspection in the developer options on your Android device to see if it helps