Closed felladrin closed 3 months ago
Yes thanks for the suggestion. I also thought about separating completely 3 interfaces and allow user to swap with their own object:
Wllama
==> the main inference runtimeModelManager
==> manage cacheModelDownloader
==> downloading model from internetThis may introduce some breaking changes so I may need to release as a major (2.0) version
I'd love to be able to set up my own Cache Manager for cases I need to customize it.
As the Cache Manager has already a signature settled, it would be good if we could pass our own implementation during Wllama initialization.
So we'd could add a
cacheManager
inWllamaConfig
:https://github.com/ngxson/wllama/blob/667dd9192540ae15a806ef8b17d3fc1728018e4d/src/wllama.ts#L14
I see we can overwrite the
cacheManager
from Wllama instance: https://github.com/ngxson/wllama/blob/667dd9192540ae15a806ef8b17d3fc1728018e4d/src/wllama.ts#L156-L157But as the
cacheManager
is not being passed down toMultiDownloads
->GGUFRemoteBlob
, it ends up that the overwrittencacheManager
is never used.Reasoning: iPads now access the websites by default in Desktop Mode. This means the user agent doesn't show them anymore as mobile devices but as Mac machines instead. This has broken this check: (So models are not able to be loaded from the cache on iPads anymore) https://github.com/ngxson/wllama/blob/667dd9192540ae15a806ef8b17d3fc1728018e4d/src/cache-manager.ts#L347-L355
If we could overwrite the cacheManager, a hotfix could be done without having to wait for a new release of Wllama.