Closed Strontium closed 2 months ago
Stump: Reads full file on SCAN and partial file on OPEN
The main reason Stump reads the full file on scan is to determine the actual page count. That operation involves iterating through each file e.g. in an archive to determine whether it is a valid page (re: image file). The validity check uses actual byte content, and falls back onto the extension, as a way of attempting to be more accurate in knowing what is truly a valid page. The only feasible way to allow for partial reads would be:
This could be a configuration. I'll have to think on implementation details, though.
Is it often you would have invalid file with an incorrect extension? What is the downside to having a potentially inaccurate page count? I think that dropping that level of accuracy is a reasonable trade-off to improve both the initial scan performance and in my case, reducing remote traffic. If it is easier to implement, i'd still be happy even if it was an optional feature and/or there were some pre-requisites to making it work (ie having the required metadata in the file). Thanks for considering it.
I can't speak to how often you would have a file inside an archive with the wrong/invalid extension. I'd hope it isn't often, and FWIW I haven't encountered the situation personally 😅
What is the downside to having a potentially inaccurate page count?
Not the end of the world, just things to consider as part of the trade-off. I'll try to see what the general consensus is for this change in behavior before committing to it
https://github.com/stumpapp/stump/pull/353 will be removing that magic header method for determining content type during scans. Once that lands, the only reads during a scan (for ZIP/RAR files) should be when a ComicInfo.xml
file is present.
Analysis jobs (still experimental) will still fully open files, but this is separate from scanning.
Completed and released as part of v0.0.4
Is your feature request related to a problem? Please describe. When working with a remote file system to the stump host, like keeping files on a cloud storage provider, there may be both bandwidth and egress limits when accessing data. Currently when doing an initial scan for a book, stump appears to request to read the entire file. When using cloud storage via an rclone mount, this scan causes the file to be downloaded in its entirety, which is inefficient and likely slow for large files. Opening the book for reading, stump appears to only request the pages it requires, hence rclone will only download those parts of the file, reducing traffic and loading times.
Describe the solution you'd like Allow an option for reducing file access during the initial scan. Scan should be limited to:
Describe alternatives you've considered Stump being remote storage aware and will limit itself appropriately automatically.
Additional context In comparison to other applications in my testing with an rclone mount: