lukechampine / user

A CLI renter for Sia
MIT License
12 stars 2 forks source link

Feature Request: Ability to specify redundancy #12

Open grigzy28 opened 5 years ago

grigzy28 commented 5 years ago

Beyond specifying chunk size with the –m option during upload, specify a redundancy requirement as well.

For example:

user upload –m 10 –r 3.0 test.zip

This would specify that user was to use 30 “active” and responding hosts out of the enabled contracts to upload this file to. Doing the “active” and responding hosts would hopefully help prevent upload failure because user currently uses all enabled hosts and fails the upload if a single host were to fail with any error. However it should also fail the upload if there isn’t enough hosts responding to fulfill the minimum redundancy requirement out of the enabled contracts.

lukechampine commented 5 years ago

I'm not strictly opposed to it, but it raises a difficult question, which is: how do you decide which contracts to use?

Currently, it's the user's responsibility to decide which contracts to enable or disable, because that's a nuanced decision to make. I wouldn't want the program to pick contacts at random, or pick the contracts with the lowest price, etc. without the user specifying that behavior explicitly.

One alternative would be a command like user bench which would benchmark each contract and print their stats. Then you could easily symlink the 30 cheapest contracts to a "contracts-enabled-30-cheapest" folder, and specify that folder when you run your upload command.

In other words, I like the idea of supporting different "contract sets," but so far I haven't found a convenient and intuitive way to implement that.

grigzy28 commented 5 years ago

I see your point about how it is the end users responsibility to specify which hosts to use and that this control should remain in the end users hands. Some kind of bench mark could help the user to determine which contracts they would like to use out of their formed contracts.

Change of thought: I still could see the redundancy factor in play here still, where if you had say, 50 hosts selected to upload to. You have the chunk size of 10 with a minimum redundancy of 3.0 then it would still upload to all 50 hosts (or all hosts that don't produce errors) as long as it reaches a minimum of 30 hosts that accepted the upload. Still reports the error'd hosts so that the user can decide what to do with them. Will error out completely if the minimum isn't reached or reachable.

lukechampine commented 5 years ago

huh. That's an interesting idea. It's wasteful, since you'd necessarily upload the same shard to multiple hosts, but that might be tolerable.

If I understand correctly, the motivation for this feature is that if one of your hosts is offline, you can't upload files at all until you disable that contract. So with this feature, if you have 50 hosts, but only 45-ish are responding, you can upload a file with a 10-of-40 encoding without disabling any contracts, and it will work as long as at least 40 hosts are online (at the cost of uploading extra data).

In general, my feeling is that if a host is offline, you should disable its contract; why would you want a contract with a flaky host? But Sia is still immature, and most hosts are flaky to some degree. So, here's an alternative proposal: instead of using a 10-of-40 code, use the full 10-of-50 code, but continue the upload even if some of the hosts can't be reached. Later, you can try again, and maybe "fill in the gaps." But in the meantime, the file will still be available as long as there are 10 hosts online.

I think this would be a good thing to have. But it sounds kinda tricky to implement. Much of the uploading code assumes that all hosts are present, and most of the metafile-related code assumes that the metafile is fully-uploaded. So I don't know how invasive this change would be. I will investigate further, but in the meantime, we're stuck disabling bad contracts as needed.

grigzy28 commented 5 years ago

Okay, doing some kind of upload resume to an original count of 50(45 originally good) with 5 new host contracts to finish the 10-50 encoding that was used for the initial upload would be good. I guess we may not need a redundancy option if this was implemented.

Scenario thought: After the 45 are uploaded and you disable and form new contracts, what if you have 10 new ones instead of just 5? We go back to the concept of what does user do with choosing contracts. The only thing I can think of in this case would be to error out and force the user to disable additional contracts until only 5 are enabled. Unfortunately that could be very time consuming if they formed say 20, or 30, having to disable and then re-enable to potentially upload once again to those new hosts or just use those same 50 working hosts again for uploads, but then what about those new host contracts that could be used as well.

Possible option: Create a new option, contract staging(or something else.) Contracts links would be removed from the enabled and put into a staging folder. Would still be one at a time, but then an option to move all the staging area's links into the enabled folder at once, saving the user some time, but also saving the user having to remember which contracts were disabled before finishing the initial upload that would need to have been re-enabled after successful upload.

Just some thoughts about the idea and processes involved.

lukechampine commented 5 years ago

When you upload with the original set of 50, each of those 50 will be recorded in the metafile. If you disable one of them, you won't be able to finish uploading; user will not automatically switch out an old contract for a new contract. Instead, you must explicitly migrate the metafile from one set of contracts to another. Currently migrate assumes that the metafile is already fully uploaded, but if we implement my idea above, migrate could simply modify the set of 50 hosts recorded in the metafile. Then resume the upload as normal.

grigzy28 commented 5 years ago

I understand how that would work, migrating invalid or unusable hosts to good ones. I just thought of something else, how about along with a min redundancy level, how about a maximum level?

Once the program reaches the maximum requested level then it would stop finding hosts that it would upload to? Not sure how this would work if you were using parallel uploads instead of sequentially. Just a thought.

vargrant commented 4 years ago

If i understand this situation correctly - we must have flexible user settings like: 1) minimal and maximal level of redunancy. 2) time of revision. 3) rule for emerengy start process to increase redunancy in case if some host is offline, but cureent redunancy is less then minimal level of required redunancy. 4) rule for disable/cancel contract with host which is long time is offline and have data fully updated with minimal level of redunancy on new hosts 5) we must to use parallel uploads only for current level of redunancy, but increasin level of redunancy must be step by step. For example we need to upload one file on several hosts with redunancy 4x - first step it's a target to reach redunancy 1x (no redunancy) and here we use parallel uploads, after when 1x redunancy will be fully reached on all hosts and minimal level of redunancy will be reached then we must start to increase redunancy and our next target must be redunancy 2x, etc till minal redunancy 4x and then till maximal redunancy for example 10x. Then more redunancy then will be better to reach maximal download speed for user if user will have good internet channel. 6) In the case when user manipulating by contracts - for example have 100 contracts, but need to select only 50 contracts then user must have some ratings for posibillity to select required hosts and this must be deppend from user settings which must be defined in configuration files. Which options is possible ? I think only 2 options - cheapest cost (flexible values because price for upload/download can be changed after space will be contracted) and fastet hosts (also very flexible variable becuase speed can be deppened from current moment, becuase after 5 minutes host can have other speed). For posibllity to rate hosts we can use https://siastats.info/hosts and we can merge this ratings our own rating tables which can contain real values of our indicators of speed upload/download by already used contracts.