Closed kauffj closed 2 years ago
https://github.com/lbryio/lbry/issues/1171 is also related.
@sayplastic is working on this for his first PR.
@kauffj, the way I understand this task, the simplest working solution for now would be:
get
API request handler, returning API error if requested file download will exceed the space allowedI'm planning on working off lbryum-refactor
branch, as per @eukreign suggestion.
Please let me know if this works for you or if anything can be improved.
@eukreign should set specification here.
However, the intention of this setting is not to error if there is not enough space, but instead to keep space usage below this figure and only error if it cannot figure out how to do so.
I suspect the code for handling this will end up in the blob and/or file manager.
@kauffj I discussed this with @jackrobison at the standup today; there isn't an obvious way to determine which files/blobs can be deleted (too many edge cases to consider). Sounds like the ultimate solution is to generate blobs on the fly, this alone will cut storage in half across the board for everyone (but I'm not sure it's fair to assign something like this as first trial task for @sayplastic, although it would definitely make for some very extra bonus points if he could do it).
I think a more straightforward solution would be to at least delete blobs when the corresponding file is no longer in Downloads directory (either it was deleted, moved or renamed). This should still be behind a configuration flag because some people just want to seed and don't necessarily care to have a copy of the combined file in their Downloads directory.
@eukreign then I would suggest not pursuing this and instead focusing on fixing whatever underlying architecture issues prevent this from being a reality. Please open those and mark this as blocked.
I don't really see much of the point in pursuing alternatives that only offer a small portion of the benefit compared to this, especially when this feature will clearly be required eventually anyway.
See https://github.com/lbryio/lbry/issues/1311 for example of users struggling with this.
Also came up at UNH hackathon, someone based their entire presentation on improving this.
When it downloads you can just have it keep track of the amount of memory taken up in total and when you're downloading something new you can just write the data to a file buffer in chunks of a certain size (the size can be variable based on a user preference, written file size, etc.) and before writing it to the file, simply check if the chunk along with used memory will exceed a set preference. If so then handle the error accordingly by cleaning the file up and letting the app know that the content exceeds the user's memory setting.
There should also be two settings implemented with this in order to properly handle running out of memory:
Content Files are Temporary, and get deleted as soon as the user stops viewing them. This specific functionality should be replaced once I am able to implement a method of streaming content rather than downloading it.
Files are cached and old files are removed when the maximum amount of memory is exceeded. This cache can be implemented using a priority queue which has the file with the oldest date as having the highest priority for removal. A Hash table is used for checking whether or not a piece of content the user wants to view is already downloaded or not.
@osilkin98
If you want to start on this general problem I think the most useful feature to start with is a new daemon command which returns the total space used by files and blobs. Please see the last comment in this issue: https://github.com/lbryio/lbry/issues/1171#issuecomment-437423320
@eukreign The daemon doesn't have to be and really shouldn't be responsible for that. I feel it's the user's responsibility to take care of unwanted files. Instead what could be done is the daemon could be given the space available as a command when downloading a new piece of content, which it ensures not to exceed. A callback function could be provided to the daemon to be used for when the amount of space exceeds a certain amount, which would just be defined by the user as a means to track files.
The callback function, really, could just be private to the app itself, but if one were at the command-line, they could provide their own.
@osilkin98 The daemon and the desktop app are running in different processes and communicating via HTTP, how would you do a callback function in that context?
Also, why is it the responsibility of the desktop app to figure out available disk space instead of the daemon? I believe the desktop app should be primarily concerned with UI and not with calculating available disk space.
@eukreign If the desktop app is just a user interface for the daemon, then the file tracking can be done by the daemon. We can avoid the problem of needing to know if the user is watching the content or not by keeping track of all the files downloaded by the daemon, and simply deleting the oldest content files when more space is needed. If the user is using an application and they're trying to download a piece of content, the assumption that they aren't currently using the oldest file downloaded is very reasonable.
@osilkin98 it gets more complicated because:
@eukreign You bring up valid points but here's the thing. What I'm talking about only applies if the user specifies that they want files to be treated as temporary.
related #586
I think we should strongly consider this a requirement for the public mobile app.
@eukreign we need to solve this before we release mobile app 1.0 and we need to release 1.0 yesterday
I don't think we can get this done by yesterday but I can work on this after we merge the asyncio branch.
The first version of this can very basic. It is not a concern if the algorithm that chooses which blobs/files to keep is dumb.
this only has one issue in it. i'm closing it
Reopening - will attach related issues.
As commented in https://github.com/lbryio/lbry-desktop/issues/4634
This issue is partially taken care of by my library, lbrytools.
It basically inspects the top level directory of the subdirectories that hold the media files and blobfiles. If it crosses a limit in gigabytes, it will start cleaning up older files. It can delete media files (mp4, mkv, etc.), blobs, or both.
lbrytools.cleanup_space(main_dir="/home/user", size=1000, percent=90, what="media")
- users likely want to keep content they themselves published
Use a list of channels to never delete content from.
never_delete = [
"@lbry",
"@Odysee",
"@samtime",
"@RobBraxmanTech"
]
lbrytools.cleanup_space(main_dir="/home/user", never_delete=never_delete)
Probably another list for claims can be used; that is, these videos won't be deleted, regardless of author. This is currently not implemented in my tools.
- how do you determine what "old" means in terms of deleting old files?
Chronological order, by release_time
or timestamp
if the first is unavailable, as it happens in older streams.
- once you have definition for "old", do you delete all "old" content or just some of it to make sure there is space, and if you only delete some percentage of it, do you delete blobs first or files first until reaching that percent of available limit?
Since the media files can be recreated from the blobs, we should delete the media files first; if it fails to clear enough space, then the blobs should be deleted. To clear the most space, both should be deleted.
lbrytools.cleanup_space(..., what="media")
lbrytools.cleanup_space(..., what="blobs")
lbrytools.cleanup_space(..., what="both")
- what should be configurable by the user in terms of space management strategies and what default space management strategies are best and most user friendly?
As much configuration as possible. At the moment I consider location of parent directory or partition, size in gigabytes, and percentage of use (90%). The cleanup will be done if the content goes above the percentage, and it should never cross the disk size, as we assume this is a physical limitation.
lbrytools.measure_usage(main_dir="/opt", size=1000, percent=90)
lbrytools.cleanup_space(main_dir="/opt", size=1000, percent=90, what="both")
By using different values of size and percentage we can test how this function works in many situations.
It seems to work okay, but probably more tests need to be done, if many claims are downloaded and the disk suddenly becomes full.
Something like this would be a Component. Take a look at https://github.com/lbryio/lbry-sdk/blob/master/lbry/extras/daemon/components.py for some components we already have. These are started when the daemon starts up (see daemon.py).
The conf setting itself would go into https://github.com/lbryio/lbry-sdk/blob/master/lbry/conf.py. Then the app could expose that somewhere on the Settings page.
Some of your other code might be useful as scripts (in the scripts/ dir) but that's outside the scope of this particular issue.
I think this can be closed. The main part of monitoring and cleaning is done for blobs, which is safer since it lives inside the SDK data folder.
IMO, when you click download for real (it isn't the default anymore) and get a real file on Downloads folder, it is now your file and you need to manage that using your OS features. It would be weird to delete files from Downloads folder automatically. However, just showing the usage should be good, which makes me think #1171 should be updated for reporting total file sizes, as we do currently to total blob space.
That said, if we really want the same for downloaded files I think we should update desktop#4634 so it becomes a feature request for the full file case. I think the same applies to feature requests that are extras or need discussion, such as new eviction policies, pinning files, what to do when it is full during a download, etc.
what does "download for real" mean?
i mostly agree with you. more generally, we keep running into this problem of having two copies of every piece of data: the blobs, and the file. whats the general solution to that? should we stop storing blobs at all and just store the file (this is what torrent apps do). should we stop storing the files and only store the blobs, and you have to actively request to save a decrypted file to some external location that the SDK does not manage? something else?
doing the former means the SDK has a narrower scope. it just downloads and seeds content. doing the latter means the SDK is also a file viewer/player, or at least the app must be.
"download for real" means calling file_save
or setting save_files
to true, which creates a normal files on Downloads folder.
should we stop storing the files and only store the blobs, and you have to actively request to save a decrypted file to some external location that the SDK does not manage?
From my understanding, this is the current behavior as save_files
defaults to false. We are also able to stream from the blobs and the app plays from that.
As a user, particularly a mobile user, I want to be able to allocate a maximum amount of disk space used by the daemon.
The daemon should automatically manage my files and blobs to not exceed using this much space.
1311 is related.
[ ] abstract blob and stream types
[ ] hooks for writing files and check to delete available space available, setting max size,