ggml-qnn: refine ggml backend subsystem

zhouwg / kantv

workbench for learing&practising AI tech in real scenario on Android device, powered by GGML(Georgi Gerganov Machine Learning) and NCNN(Tencent NCNN) and FFmpeg

Apache License 2.0

96 stars 17 forks source link

This PR is equal to a PR in upstream GGML communtiy:

https://github.com/ggerganov/llama.cpp/pull/7641

Unfortunately, the PR in upstream was closed by the maintainer of ggml backend subsystem very quickly/immediately less then 1 minute after I summited this PR in upstream llama.cpp.

I totally disagree with what the maintainer of ggml backend subsystem said(because some special backends only need system memory):

"There are too many things wrong here to list. At the most basic level, this approach will not work because backends typically have a memory that is not accessible from other backends, and when switching to a different backend it is necessary to ensure that all the tensors required to evaluate the graph are available in the backend memory. This is the main job of ggml_backend_sched."

zhouwg / kantv

ggml-qnn: refine ggml backend subsystem #216