tkkcc / gamebot

(WIP) android game automation with rust
7 stars 0 forks source link

design #1

Open tkkcc opened 5 months ago

tkkcc commented 5 months ago

GameBot is designed to work for years without human operations. GameBot should be self upgradable, because mobile games iterating fast. GameBot relies on AccessibilityService and MediaProjection, and use hidden api via HiddenApiRefinePlugin, and gain permission via Shizuku and libsu

plugin / hot update / hot fix / code push / dynamic loading / scripting

  1. plain DexClassLoader/PathClassLoader

    host apk --DexClassLoader/PathClassLoader--> plugin apk/jar/dex

    compile andorid library into dex and load via DexClassLoader, the drawback is all dependencies must be bundled into host apk, library can't add new dependencies, and host apk almost can't use r8. library's resource can't be simply load, but kotlin i18n solutions can be used, like lyricist. fat jar solutions may help, like shadow

    compile andrlid application into apk and load via PathClassLoader, this is what mediabox does. plugin can use new dependencies, but r8 almost unusable.

  2. Shadow / Tinker

    hacky and complicated, may be try BlackShadow

    above two solutions violate google play policy:

    An app distributed via Google Play may not modify, replace, or update itself using any method other than Google Play's update mechanism. Likewise, an app may not download executable code (such as dex, JAR, .so files) from a source other than Google Play. This restriction does not apply to code that runs in a virtual machine or an interpreter where either provides indirect access to Android APIs (such as JavaScript in a webview or browser). Apps or third-party code, like SDKs, with interpreted languages (JavaScript, Python, Lua, etc.) loaded at run time (for example, not packaged with the app) must not allow potential violations of Google Play policies.

  3. flutter

    shorebird

    Also Shorebird does not support changing native code (e.g. Java/Kotlin on Android or Objective-C/Swift on iOS). The tool will warn you during an attempted patch if you have changed native code.

  4. react native

    react-native-code-push

    Any product changes which touch native code (e.g. modifying your AppDelegate.m/MainActivity.java file, adding a new plugin) cannot be distributed via CodePush, and therefore, must be updated via the appropriate store(s).

    above two solutions is complicated as GameBot need deep integration with android system api, so we need to export java/kotlin function into dart and js. we also need to ship their runtime library. can use rich ui component. flutter doesn't support x86. react native does't support 120fps

  5. js embedded

    rhino, used in autojs and it's forks nodejs, used in autojs boa quickjs deno

  6. python embedded

    jython chaquopy rustpython, compilable but runtime exception

  7. lua embedded luajava mlua

  8. wasm embedded wasmtime wasmer

    above solutions are simpler and safer, but we need to export java/kotlin function into vm, if using compose instead of view, we need to implement some kind of serializable data structure. we need to transfer image byte into vm efficiently. first three solutions have zero compile time, but need to struggle with dynamic typing. for wasm, we can use rust, but have to drop support for 32 bit platforms.

    lib size for x86_64:

    jni-rs
    132K
    
    +android_logger
    892K
    
    +mlua(lua51,vendered)
    1.1M
    +mlua(lua54,vendered)
    1.2M
    +mlua(luajit,vendered)
    1.5M
    +mlua(luaujit)
    1.8+1.2M
    +mlua(luau)
    1.5+1.2M
    +wasmer
    4.5M
    +rustpython
    7.3M
    +boa
    4.7M
    +wasmtime
    5.7M
  9. simplar projects

    kotlin/java: FGA, granblue, RobotHelper js: hamibot, autojs6, autox, 自动精灵, EasyClick, 冰狐, aibote, 云控, aiwork lua: 懒人精灵, 节点精灵, 触动精灵, 触摸精灵, 积木编程, 鱼叉助手, 飞天助手, 叉叉助手, autolua python: ascript, aibote, 小派精灵 VBScript/Q: 按键精灵 rpa: 影刀, UiBot, ...

tkkcc commented 4 months ago

plugin update from internet

we use git protocol to do this for its atomic multi-file sync and free hosting. jgit is very small, total dex 1M, while git2 need 6.1M * 2 3.3M * 4 with vendored ssl, and we have to disable ssl check. another lib to try is gitoxide

tkkcc commented 4 months ago

rootless support

should stick with shizuku and libsu. These two share common functionality (the shell user permission).

benefits:

  1. single interface for host and plugin development
  2. can use static screen resolution on runtime

drawbacks:

  1. no one like to install two apps for one task. so some apps embed activation function., like virtual machine apps.
tkkcc commented 4 months ago

wasm or lua

should stick with wasm, benefits:

  1. runtime and plugin all in rust
  2. modern language and promising ecosystem
  3. no one do this before

drawbacks:

  1. need compile when developing
  2. runtime and plugin size is bigger
  3. not open source to user, but not easy to hack
tkkcc commented 4 months ago

market / billing / online device limitation

  1. should support publish to market and create standalone app ( custom package name and icon )
  2. like neovim, aur, use git repo as plugin distribution. use a index file as the market (simple toml file, pull from our git), each plugin has repo, icon, author, description. plugin developer pull request to add new plugin (but how to update description?), the server needs to get commit messages (with commit time) from repo, total and recent downloads like crates.io. but pull update from multiple git repo can be limited. should ask plugin developers to push package info to our repo, like aur and crates, npm?
  3. monetize is based on cloud support: builtin payment support, billing based on device count and time.

different from ascript: total free if the plugin doesn't use our server. free to make standalone app. 1% billing for using our service.

different from lanren: better local ocr and captcha bypass

  1. make and use plugin in multiple language as we use wasm
  2. provide web debug tool like ascript and lanren, we need to run wasm compiler in browser? or just embed lua. or make native debug tool
tkkcc commented 2 months ago

wasm vs native

  1. wasm can't use thread, socket yet(p1), popular crates have no special support for p2
  2. reqwest not support wasi, also thread.spawn, tokio::net, mlua
  3. wasm can't be used in 32bit
  4. wasm need only compile once, native need 2 ~ 4 times
  5. wasm's performance is less, not free to use heavy computation library? x86_64 enable sse, aarch64 enable neon
  6. native with hotupdate is forbid in google play
  7. wasm is more interesting

we choose the native way currently, for more freedom

tkkcc commented 2 months ago

GameBotDevTool

Attempt to implement dev tool for GameBot, support connect to multiple devices, color/img picker in fetched screenshot, node inspect in fetched node tree, OCR and object detection test. So there must be a server on mobile, and user use a local front end to communicate. Front can be gui or web browser. It must support win, linux and macos. Similar debug tools exist in https://ascript.cn/, https://www.nspirit.cn/, and http://lrappsoft.com/.

why not use tauri

  1. tauri v2 have much more features but not ready, docs not complete
  2. distribution need mac and win, need online CI(github action). Or let user install tauri-cli and nodejs then build by themselves
  3. need to use typescript as rust front end framework is too young(lack modern designed component)
  4. user's webview runtime can affect appearance and capability

why not use flutter

  1. distribution need mac and win, need github action. Or let user install flutter toolkit

why not use kmp

  1. distribution need mac and win, ...
  2. big package size, more than 100M

why not use egui

  1. old fashion component like DearImGui and DearPyGui. can we use rerun components?
  2. no system font support(waiting cosmic-text), but we can use font-kit to solve

why not use slint

  1. font anti-aliasing unacceptable without qt backend, like on win
  2. widget style not complete

why not use iced

  1. no ime support(wait 0.14), so user can't search cjk text

why not use ribir

  1. font anti-aliasing broken
  2. default font fallback broken
  3. docs not complete

why not use relm4

  1. need gtk4 dev environment, so user can't simply cargo install. Or we bundle gtk4 libs, and distribution via CI
  2. heavily use declarative macro

web browser

  1. need to use typescript, and communicate by ourselves
  2. filesystem api ponyfill for firefox/safari
tkkcc commented 2 months ago

plugin as future or thread or process

future

plugin entry is future, execute by tokio::spawn, so plugin is in same or different threads, and in same process. we force plugin developer to use async rust, but it's not nice, game bot doesn't need much concurrency. the ui part may need async runtime, or not.

same as the thread way, we need dynamic loading, and has memory leak possibility.

thread

plugin entry is block function, execute by thread::spawn or tokio::spawn_blocking, so plugin is in separate thread in same process.

to make plugin cancelable(stop/restart by user or plugin it self), for sync api, we need to pass cancel_token and insert check in critical api. for async api, plugin entry can use tokio::select! with cancel_token.

we must do dynamic loading in rust side, use rust libloading or dlopen2. because android's System.load and System.loadLibrary can't reload library without app process restart(so all plugins restart) on plugin reload/restart, there could be leaked resources, even leaked threads the only clean way to update and reload a library is process restart. otherwise static in library will leak.

need to compile with ndk, extra environment requirement for plugin development.

process

plugin run in separate process. we can run command in kotlin or rust or startActivity with different process. but the last way consumes more memory(compose ui activity need 100MB).

we don't need dynamic loading. we can easily start and stop process, and restart without memory leak possibility (assume we don't leak memory in our host).

plugin developers don't need to download ndk source, and cargo-ndk, only rust. need to compile with ndk.

we need more time for ipc, binder or socket or pipe or shared memory? tarpc with bincode over unix domain socket need 50ms~100ms to transfer 2880x1620x4 bytes. ndk shared memory only available since android 8.0

host app may not need jni and rust, all in kotlin seems ok.

tkkcc commented 1 month ago

rethink screenshot

currently we use UiAutomation.takeScreenshot, it works with logical size and dpi change, performance is close to MediaProjection+ImageReader and better than looping screencap.

  1. Why not virtualDisplay+MediaCodec like scrcpy: need more time to encode and decode on ByteBuffer, lossless MediaFormat is not availeble on all devices. need preset bitrate.

  2. Why not virtualDisplay+MediaRecorder: it directly writes to output, need to decode frames

  3. Why not virtualDisplay+SurfaceTexture+glReadPixels: it's complicated. need to copy from gpu to cpu finally, as we don't know how to do template matching with GLES api.

  4. Why not virtualDisplay+ImageReader: createVirtualDisplay need hidden api, to avoid front service notification and request permission. scrcpy use two hidden methods and FakeContext to create virtualDisplay, while UiAutomation way is simple and not relies much on hidden api. there is memory leak on surfaceflinger on some x86 emulators: avd <10, genymotion <10, but not on leidian 7&9. the leak is observable when using scrcpy on screens with fast changing content, like stopwatch.

  5. Why not UIAutomation: on genymotion 7, the cpu usage is higher than ImageReader way(15% vs 21%), as it's working more with software Bitmap, while ImageReader way only need buffer copy(gpu2cpu + cpu2cpu). on genymotion 11~13 and my linux, UiAutomaiton way has fast memory leak(visible on host htop), but not on any other devices: avd 15 on linux, mumu 12, genymotion 13 on win, chinac 7. the leak happens on hardware Bitmap copy to software. UIAutomation way is bit slower than displayProjection.

UIAutomation vs display projection: later is faster ```txt oneplus ace2 14 02:41:11.453 D screenshot time 33ms 33ms 02:41:11.625 D screenshot time 41ms 40ms 02:41:11.814 D screenshot time 46ms 45ms 02:41:11.965 D screenshot time 31ms 30ms 02:41:12.133 D screenshot time 34ms 33ms 02:41:12.310 D screenshot time 31ms 30ms 02:41:12.466 D screenshot time 37ms 36ms vs 02:42:57.466 E duration 6 direct: 1 size: 14192640 02:42:57.512 E duration 12 direct: 1 size: 14192640 02:42:57.519 E duration 6 direct: 1 size: 14192640 02:42:57.525 E duration 6 direct: 1 size: 14192640 02:42:57.531 E duration 5 direct: 1 size: 14192640 02:42:57.572 E duration 7 direct: 1 size: 14192640 02:42:57.617 E duration 12 direct: 1 size: 14192640 02:42:57.626 E duration 9 direct: 1 size: 14192640 02:42:57.631 E duration 5 direct: 1 size: 14192640 02:42:57.670 E duration 5 direct: 1 size: 14192640 02:42:57.711 E duration 7 direct: 1 size: 14192640 avd 7: 14:33:25.700 D screenshot time 14ms 13ms 14:33:25.776 D screenshot time 13ms 13ms 14:33:25.852 D screenshot time 13ms 13ms 14:33:25.932 D screenshot time 15ms 15ms 14:33:26.010 D screenshot time 15ms 15ms 14:33:26.092 D screenshot time 20ms 20ms 14:33:26.174 D screenshot time 19ms 19ms vs 14:31:45.304 E duration 9 direct: 1 size: 10368000 14:31:45.345 E duration 8 direct: 1 size: 10368000 14:31:45.387 E duration 9 direct: 1 size: 10368000 14:31:45.396 E duration 9 direct: 1 size: 10368000 14:31:45.411 E duration 15 direct: 1 size: 10368000 14:31:45.455 E duration 10 direct: 1 size: 10368000 14:31:45.496 E duration 7 direct: 1 size: 10368000 14:31:45.504 E duration 8 direct: 1 size: 10368000 14:31:45.545 E duration 8 direct: 1 size: 10368000 14:31:45.584 E duration 6 direct: 1 size: 10368000 14:31:45.626 E duration 9 direct: 1 size: 10368000 genymotion 7: 02:47:28.587 D screenshot time 13ms 13ms 02:47:28.633 D screenshot time 15ms 15ms 02:47:28.678 D screenshot time 16ms 16ms 02:47:28.901 D screenshot time 191ms 191ms 02:47:28.947 D screenshot time 17ms 17ms 02:47:28.994 D screenshot time 17ms 17ms 02:47:29.043 D screenshot time 20ms 20ms 02:47:29.091 D screenshot time 18ms 18ms 02:47:29.143 D screenshot time 19ms 19ms 02:47:29.190 D screenshot time 18ms 17ms 02:47:29.234 D screenshot time 14ms 14ms 02:47:29.279 D screenshot time 16ms 15ms 02:47:29.325 D screenshot time 14ms 14ms 02:47:29.374 D screenshot time 20ms 19ms vs 02:45:59.789 E duration 15 direct: 1 size: 3686400 02:45:59.801 E duration 12 direct: 1 size: 3686400 02:45:59.829 E duration 8 direct: 1 size: 3686400 02:45:59.843 E duration 8 direct: 1 size: 3686400 02:45:59.884 E duration 9 direct: 1 size: 3686400 02:45:59.913 E duration 10 direct: 1 size: 3686400 02:46:00.145 E duration 22 direct: 1 size: 3686400 02:46:00.160 E duration 15 direct: 1 size: 3686400 02:46:00.176 E duration 16 direct: 1 size: 3686400 02:46:00.187 E duration 11 direct: 1 size: 3686400 02:46:00.197 E duration 10 direct: 1 size: 3686400 02:46:00.205 E duration 8 direct: 1 size: 3686400 02:46:00.225 E duration 6 direct: 1 size: 3686400 02:46:00.241 E duration 6 direct: 1 size: 3686400 02:46:00.277 E duration 8 direct: 1 size: 3686400 leidian 9.0 14:36:59.657 D screenshot time 21ms 20ms 14:36:59.706 D screenshot time 19ms 19ms 14:36:59.753 D screenshot time 20ms 20ms 14:36:59.801 D screenshot time 20ms 19ms 14:36:59.846 D screenshot time 18ms 18ms 14:36:59.892 D screenshot time 18ms 18ms 14:36:59.939 D screenshot time 19ms 19ms vs 14:37:39.690 E duration 17 direct: 1 size: 5760000 14:37:39.699 E duration 9 direct: 1 size: 5760000 14:37:39.723 E duration 18 direct: 1 size: 5760000 14:37:39.733 E duration 10 direct: 1 size: 5760000 14:37:39.740 E duration 7 direct: 1 size: 5760000 14:37:39.793 E duration 18 direct: 1 size: 5760000 14:37:39.802 E duration 9 direct: 1 size: 5760000 14:37:39.810 E duration 8 direct: 1 size: 5760000 14:37:39.819 E duration 9 direct: 1 size: 5760000 ```
tkkcc commented 1 month ago

2 days for one bug

phenomenon:

trying the virtualDisplay+ImageReader way, but after fetching new screenshots for several minutes, there is no more. it's even magical that if we allocate bitmap or bytebuffer during fetching, i can only get 1 or 2 screenshots.

attempts and thought:

  1. the more we allocating, the faster it stops. it seems that gc is stopped in current context. if we add gc requst, it can produce little more screenshots.
  2. we are in a service context(shell or root) and binder connection callback. it's special context makes gc down?

final solution:

we are not holding reference to display, so it's can be gc and stops producing new screenshots. but it works for a while! the more we allocating, the faster it getting gc.

we should recheck scrcpy's code. it holding ref correctly. we are doing premature optimization!