I had observed previously that the cache had become abnormally large, even though only a few files had been added to the directories I was searching. This was with the unfamiliar kind of database previously used.
Now that the cache is an SQLite database, I understand what probably happened. Adding, for instance, a custom adapter for djvu files changes the list of active adapters that is recorded in the database for a PDF file, even though the djvu adapter is not applicable to that file.
Shouldn’t this list only contain adapters that are applicable to the given file, to avoid recaching the file each time an irrelevant adapter is added, modified or disabled?
To Reproduce
Run rga --rga-cache-path=/tmp/throwaway_cache abc in a directory containing one pdf file named xyz.pdf, then run rga --rga-cache-path=/tmp/throwaway_cache --rga-adapters=-ffmpeg abc, disabling the ffmpeg adapter, which doesn’t apply to pdf files.
Output
The preproc_cache table in /tmp/throwaway_cache/cache.sqlite3 contains two entries for xyz.pdf, one where the field active_adapters contains ffmpeg.v1 and one where it doesn’t, so that the cache has twice the size it could have.
This should only be happening for archives, where the list of adapters (even others than the one for the main file) can affect the result of preprocessing. if it happens for all files it's a bug
Describe the bug
I had observed previously that the cache had become abnormally large, even though only a few files had been added to the directories I was searching. This was with the unfamiliar kind of database previously used.
Now that the cache is an SQLite database, I understand what probably happened. Adding, for instance, a custom adapter for djvu files changes the list of active adapters that is recorded in the database for a PDF file, even though the djvu adapter is not applicable to that file.
Shouldn’t this list only contain adapters that are applicable to the given file, to avoid recaching the file each time an irrelevant adapter is added, modified or disabled?
To Reproduce
Run
rga --rga-cache-path=/tmp/throwaway_cache abc
in a directory containing one pdf file namedxyz.pdf
, then runrga --rga-cache-path=/tmp/throwaway_cache --rga-adapters=-ffmpeg abc
, disabling the ffmpeg adapter, which doesn’t apply to pdf files.Output
The
preproc_cache
table in/tmp/throwaway_cache/cache.sqlite3
contains two entries forxyz.pdf
, one where the fieldactive_adapters
containsffmpeg.v1
and one where it doesn’t, so that the cache has twice the size it could have.Output of
rga --version
ripgrep-all 1.0.0-alpha.5 (commit 16b2059).