wetube / bitcloud

Bitcloud Project
http://bitcloudproject.org
MIT License
613 stars 56 forks source link

CA creation #29

Closed JaviLib closed 10 years ago

JaviLib commented 10 years ago

The first thing we need to address is the creation of CAs. In order to avoid sybil attacks, the creation must by costly.

1) What CA algorithm or library to use. http://en.wikipedia.org/wiki/Certificate_authority#Open_source_implementations

2) How to create CA out of a mining process.

3) How to integrate that in our blockchain

Dr-Syn commented 10 years ago

Issues 2 and 3 are the important ones. Issue 1 will be determinable after we figure out how to make the thing happen.

Dr-Syn commented 10 years ago

Also, consider that preventing sibyls is not going to be possible: a sufficiently motivated attacker will find a way.

What we want to do is tune the system such that an attacker who goes through the trouble of making sibyls will add more value to the network than is taken away from detecting and nuking the sibyls.

JaviLib commented 10 years ago

What we want to do is tune the system such that an attacker who goes through the trouble of making sibyls will add more value to the network than is taken away from detecting and nuking the sibyls.

1)I'd like to hear how is that possible. By imposing a reputation in order to be able to generate new IDs?

2)For the mining algorithm I suggest using a proof of work of a non-standard sha-variant algorithm, because there are already ASIC constructed for hashcash algorithms.

Some links:

https://en.bitcoin.it/wiki/Proof_of_work

According to wikipedia, these are the list of known proof of work algorithms:

http://en.wikipedia.org/wiki/Proof-of-work_system#List_of_proof-of-work_functions

I suggest a fixed difficulty with a value we can estimate to be hard enough. The miner actually has to request the conditions for the proof of work to the node pool and work on it until it finds the solution. Then it submits the solution associated with a generated public key (from the private key). From there on, the node can operate.

After that, we can impose your idea of continual service in order to avoid revocation.

Any ideas are welcomed.

cbbcbail commented 10 years ago

We have to prevent Sybil attacks. (Making them so that they don't harm the network would be considered preventing it)

ID hashing is a good idea but it makes it variable. Some guy could start mining his ID and luckily find the solution in 3 seconds, it could take another guy 3 years. I realize that this is unlikely, but it would be technically possible. It is all dependable on how long it takes for your computer to solve the algorithm. Another thing I dislike about the idea is that it gives an unnecessary advantage to those with faster processing capabilities. This might be unfair as some people cant afford a faster processor and are, therefore, going to have to wait longer before they can generate their ID.

However, I don't know of any other way to do this. If we could find a better one, though, I'd be all for it, but as of right now, despite the disadvantages, it seems to be the best option.

JaviLib commented 10 years ago

@cbbcbail

I have another idea, and it about requesting not only the acceptance of a CA, but also a certain amount of bandwidth that can be served, plus a number of maximum delegated IDs allowed for each node. For example:

For just a name and 10mps and 10 users could have X difficulty. For a name and 100mp and 100 users could have X*100 difficulty.

This is because normally whoever has a lot of CPU power also has a lot of bandwidth available.

Dr-Syn commented 10 years ago

@JavierRSobrino

Some kind of value is presented to the network by the action that authorizes the CA. Determine what this value is. Determine how much 'cost' a misbehaving CA is going to cause the network in a worst-case scenario.

Once those concepts are determined, the tuning can be accomplished--so the discussion here should be:

  1. What value can we bring to the network through this process?
  2. How hard is it going to be to detect and counter a misbehaving CA?

@cbbcbail

Prevention is not going to happen. Mitigation is a workable goal. Some harm will always be done by a motivated attacker, so the goal is to make that harm smaller than the amount of good that will be produced by their attempts to cause harm.

Dr-Syn commented 10 years ago

@JavierRSobrino

Go shopping for a few VPS instances, and you'll quickly find out that CPU power and available bandwidth are not at all related.

JaviLib commented 10 years ago

Go shopping for a few VPS instances, and you'll quickly find out that CPU power and available bandwidth are not at all related.

But discrimination per number of possible user creations is still possible. So a CA able to create 100 users would be 10 times more expensive than one able to create 10.

Dr-Syn commented 10 years ago

@JavierRSobrino

I see what you're getting at, but you're approaching it from the wrong angle.

Bandwidth is not necessarily going to be the limiting factor for a CA, but it -would- be possible to limit the number of affiliated systems if they have to continually mine for maintaining a number.

For instance, if a user is equivalent to a subdomain of the CA, then it may be possible to control the number of allowable subdomains they can issue. Going to need to dive into the SSL spec to see if that'll work.

JaviLib commented 10 years ago

@Eric

Some kind of value is presented to the network by the action that authorizes the CA. Determine what this value is. Determine how much 'cost' a misbehaving CA is going to cause the network in a worst-case scenario.

There are too many concepts that we haven't covered yet to determine so. We are constructing the protocol from top to down I think. We can go back to determine the exacts variables when we have a better overview of the particular goals.

Once those concepts are determined, the tuning can be accomplished--so the discussion here should be:

  1. What value can we bring to the network through this process?
  2. How hard is it going to be to detect and counter a misbehaving CA?
  3. The initial reputation for entering nodes and the decreases in sybil attacks.
  4. Thats why I suggested the sync process (that can be continuous or every X min) of the node pool. All statistics are stored in the node pool and all nodes should be judging other random nodess. That is why I figured all the system of laws, judges and verdicts.
Dr-Syn commented 10 years ago

@JavierRSobrino

Ok, reporting of CA-untrusts through the nodepool sounds valid enough.

What other value can we make this process bring to the network as a whole other than churning hashes? Is there some kind of no-trust-required process we can make them complete?

JaviLib commented 10 years ago

Bandwidth is not necessarily going to be the limiting factor for a CA,

Perhaps I didn't explained myself clearly: The bandwidth I was talking about is the maximum bandwidth allowed for the users of the CA (not the bandwidth that a node serves to other users)

but it -would- be possible to limit the number of affiliated systems if they have to continually mine for maintaining a number.

Do you mean mining with the CPU?

For instance, if a user is equivalent to a subdomain of the CA, then it may be possible to control the number of allowable subdomains they can issue. Going to need to dive into the SSL spec to see if that'll work.

That is what I was talking about. We don't need to know if SSL allows that, as we can implement the limit in the node pool as a separate procedure of the certificate signing one.

Dr-Syn commented 10 years ago

@JavierRSobrino

MaxBandwidth for the users of the CA would be clumsy and probably not properly measurable. The latency from reporting of that through the nodepool would enable all manner of nonlinear behavior from people gaming the system. I don't think that's a valid way to go.

See, anything communicated through the nodepool is going to propagate relatively slowly--measured in the order of minutes, if not hours, because hosts can only synchronize their databases so fast.

If we find a way to limit the number of signatures a CA can keep active at any one time, that'd prevent bad behavior much more effectively.

Also, I just thought of a possibility for untrusted work that would be effective: validating escrows. The escrow validation requires a threshold number of systems to agree for the transaction to go through; having the CA pool participate in that validation mitigates many of the issues with trust that would otherwise arise, especially with potential conflicts of interest if a single entity runs storage nodes and a CA.

JaviLib commented 10 years ago

See, anything communicated through the nodepool is going to propagate relatively slowly--measured in the order of minutes, if not hours, because hosts can only synchronize their databases so fast.

Not a problem as you may think because we can impose the pre-registering of public anonymous keys in the nodepool. For registered users they need to mine the name (in the user interface) and are independant of any CA so we don't have this problem.

If we find a way to limit the number of signatures a CA can keep active at any one time, that'd prevent bad behavior much more effectively.

You're right because that will cause less overhead to the nodepool.

Also, I just thought of a possibility for untrusted work that would be effective: validating escrows. The escrow validation requires a threshold number of systems to agree for the transaction to go through; having the CA pool participate in that validation mitigates many of the issues with trust that would otherwise arise, especially with potential conflicts of interest if a single entity runs storage nodes and a CA.

In fact, the are many uses. If you read the law system I designed (the BCL in the bitcloud.org file), there are many many uses. Each law would require this process.

cbbcbail commented 10 years ago

Right, I think some of these other approaches that you guys have been mentioning would work better than CPU /GPU mining for IDs. I especially like Javier's original suggesting and I think you two have worked through and solved a few flaws already.

I also agree that max bandwidth would be difficult and clumsy for the system to deal with. I also think that an "untrust process" would be a very good idea if we can figure out a way to implement it.

As far as preventing critical attacks. If you think the system will always be vulnerable to potentially lethal attacks then what is the point in even pursuing the creation of such a system? We need to definitely ensure that it causes the least amount of damage as possible, but hopefully we can take this to the point where it has seemingly no effect, thus actually eliminating the problem.

Dr-Syn commented 10 years ago

@cbbcbail

You seem to forget that the web as implemented has zero protections against sibyls. Anything is better than the current situation.

If you spend some time learning how security works, you will quickly see that it is simply not possible to make a completely secure system. I, personally, have a motto--"Semper alia via"--expressing that "There is always another way".

So aiming to -prevent- attacks of all kinds is not going to work. At best you will fail; at worst you will make something completely unusable.

A realistic goal is to mitigate attacks--to make the attacks less effective than they would otherwise be. The goal I have in mind here is that those who are attempting such attacks add more value to the network than their attacks take away.

An 'untrust' process is as easy as putting a button on the CA's interface to say "do not trust this CA anymore"--there's perhaps some discussion as to whether a blacklist or a whitelist format for CA trust would be more appropriate, but in either case, it's a very simple effort for the CA's manager.

@JavierRSobrino

I'm going to have to sit down and reread the whole law section over again, I suppose.

cbbcbail commented 10 years ago

I am very well aware that it is impossible to make a completely secure system. This is a fundamental part of network security. I do, however, think that a whole in a security system that is not only known of but also presents a possibility for severe consequences presents a real and serious problem.

The current Internet works in a way in which Sybil attacks could not harm the network as a whole. So your reference to the fact that the Internet presents no guards against them is irrelevant.

Preventing the attacks that we can present is a very necessary thing to do. We obviously can not stop what we can't foresee but we can stop what we know can happen. If there is a fundamental flaw in the proposed system making it susceptible to critically lethal attacks, then there is simply no point in its creation.

I know how the CA distrusting process works. I am familiar with using the mouse to click on interactive buttons. This is not what Im referring to. Im referring to whether or not we could set it up so that it would mitigate the issues with ID mining and still keep the User creation process minimal.

JaviLib commented 10 years ago

I also agree that max bandwidth would be difficult and clumsy for the system to deal with. I also think that an "untrust process" would be a very good idea if we can figure out a way to implement it.

The 'untrust process' should be the same than the 'trust process' because we are a reputation system. For this, we must define a Node Pool were all statistics and verdicts are stored and how it is synced. See #30.

The difficult part is how to do an effective sync process. We shall not relay on bitcoin-like sync process because it is slow. Therefore we shall open another issue for thinking how the sync process must happen.

JaviLib commented 10 years ago

An 'untrust' process is as easy as putting a button on the CA's interface to say "do not trust this CA anymore"--there's perhaps some discussion as to whether a blacklist or a whitelist format for CA trust would be more appropriate, but in either case, it's a very simple effort for the CA's manager.

We shall not disturb users. Forcing users to have to press a button will make the system unusable when it becomes big. Also, users shall not to be worried about having to maintain the network. Let the nodes do the work, they are paid for doing so! My suggestion is automatic trust/untrust between nodes based in a law system, with judges and verdicts. Yes, we should let the nodes freedom to how to judge other nodes, and we present a default way in the official implementation.

JaviLib commented 10 years ago

@cbbcbail

I agree with you, but we should make a distinction:

Why? Because registered users, those with a name, should not be dependent of any other CA in order to avoid censorship.

If a major CA grid or publisher decides to limit entrance in the system to the users of another major CA, then you have there a serious problem: censorship.

Registered user names should be completely independent of any CA, and CAs revoking registered users CAs should be severely penalized. This is why I invented the law of service.

Dr-Syn commented 10 years ago

@JavierRSobrino

The reason I was insisting on all users credentialing through a CA is so that standard revocation lists from the CA--remember, in the original concept, you wanted the 'moderators' to be a paid position; this would be an appropriate implementation of that concept--would be pushed to the user when the user checks the certs from the foreign system against their CA's black/revocation lists.

Reputation of users above and beyond what it takes for them to get hooked to a CA in the first place is unimportant.

Having a button that says "I don't want to talk to this guy anymore" is no big deal; if it becomes an actual burden, then I'm fairly sure that some form of plugin will be written to automatically 'press the button'--or they can move to a CA that actually handles the kind of revocations that they want.

@cbbcbail

There is something about this that you don't appear to understand, and I'm not entirely sure what it is.

JaviLib commented 10 years ago

@Eric

So your idea is that users must find a place to register?

What about content embedded on webpages accessible without having to register?

Dr-Syn commented 10 years ago

@JavierRSobrino

Registering with a CA is an essential step to ensuring that all connections remain encrypted. Full and consistent encryption is a necessary attribute for a number of reasons.

JaviLib commented 10 years ago

Registering with a CA is an essential step to ensuring that all connections remain encrypted. Full and consistent encryption is a necessary attribute for a number of reasons.

I know that, but I thought we were discussing the separation of registered users (mined names) from unregistered (assigned from an existing CA, for example the CA of the gateway).

Dr-Syn commented 10 years ago

@JavierRSobrino

I'm not seeing any advantage to the system from having independently mined names vs. having all users go through one or more CAs.

JaviLib commented 10 years ago

I'm not seeing any advantage to the system from having independently mined names vs. having all users go through one or more CAs.

  1. No need to register into a 3th party.
  2. No need to sell nicknames (example: "Eric")
  3. No need to change from CA and waiting for approval.
  4. Independence of CA. Example, your name will not be associated with "Disney" if you registered with them.
  5. No discrimination from where you registered (a node may want to censor users from "YouPorn" CA).
  6. Same nickname and credentials can be used in the entire net.
Dr-Syn commented 10 years ago
  1. Malicious users are harder to hold accountable if there is no authority to leverage.
  2. No need to sell 'em with CAs, either.
  3. That's entirely dependent on the CA's policies
  4. You can register with multiple CAs; nobody's stopping you.
  5. That's a feature that allows for private or semi-private grids
  6. Same thing can be done with CA-requisite; your credentials would just authenticate you to your system; the CAs would sign off or revoke their endorsement of your certificate independently of your sign-on.
JaviLib commented 10 years ago
  1. How is that? You can revoke both users and others CA.
  2. How a user register a nickname if CAs are providing the certificates for the users?
  3. It is a form of censorship.
  4. And everybody will know what CAs you are attached to.
  5. You can have private grids with mined nicknames too.
  6. Only if the CA you accesing trusts the CAs you registered in. Users are not guilty of whatever the CA you registered in is doing. If that CA becomes malicious, you become censored in many other places too.
Dr-Syn commented 10 years ago

@JavierRSobrino

  1. Individual user revocation is gopher-hunting: hit one, another pops up. CAs are easier to handle b/c they have a vested interest in remaining available, so they're more likely to police matters on their end.
  2. User has a machine-generated UUID, human-readable non-unique nickname, and certs. CAs sign certs associated w/ UUID, which allows user to pick and choose human-readable nickname as they like.
  3. So? If you don't like that CA or grid's policies, go to one that supports what you want to do. Overall net remains uncensored; let the nannies have their 'safe' enclaves. Get more buy-in that way.
  4. Yep. And that's why I supported your notion of multiple profiles, so you can have your porn profile and your disney profile and keep them separate.
  5. Much more state to keep track of; it'll make the nodepool clumsier to deal with.
  6. Which is why you'd dump that CA/profile and chalk it up to lessons learned.
JaviLib commented 10 years ago

@Eric

I would support your idea of users having to register if you provide:

  1. A way to register human-redeable nicknames which is difficult. We don't want nicknames to be easy to catch up. Some kind of mining difficulty or reputation must limit the amount of nicknames that a CA can provide.
  2. For 1) my idea was to establish difficulty in function of the nickname. Example, an estimated 1h of CPU work for a 10 characters name, 1 day for 7 chars, 1 week for 5 chars, 1 year for 3 chars, 100 years for 1 char, on a regular desktop.
  3. How to manage unregistered users.
  4. How to measure bandwidth for all of them.
  5. How to mitigate bandwidth measurement cheats, so the payers trust in Bitcloud escrow system.
Dr-Syn commented 10 years ago

@JavierRSobrino

I'm a bit puzzled here on why you'd need to have difficulty in registering a human-readable nickname. A unique ID string is enough to identify them for the CA and any other resources that need to keep accounts separate.

As for issue 3, all users are 'registered' but some are 'registered' with an anonymized profile. Some CAs--which will, naturally, tend to be less trusted in some places than others--will be more than happy to support that functionality.

Measuring bandwidth is irrelevant to users; what we can actually charge for is resources fetched--and that's handled via unique encryption keys to unlock received content, mediated by the CAs that are handling the escrow.

JaviLib commented 10 years ago

I'm a bit puzzled here on why you'd need to have difficulty in registering a human-readable nickname. A unique ID string is enough to identify them for the CA and any other resources that need to keep accounts separate.

For having names globally available. Even if you don't plan to have names globally available for users, there must be a way to easy identify a CA. Disney may be want to be identified as the publisher "disney", not as an unredeable random ID. So for example, a user may be disney:eric or something like that. Anyway my original idea is that "eric" could be registered as a global name too, and that is why I introduced mined nicknames.

And having a nickname globally registered has some advantages, for example, a common profile across the entire net.

As for issue 3, all users are 'registered' but some are 'registered' with an anonymized profile. Some CAs--which will, naturally, tend to be less trusted in some places than others--will be more than happy to support that functionality.

Measuring bandwidth is irrelevant to users; what we can actually charge for is resources fetched--and that's handled via unique encryption keys to unlock received content, mediated by the CAs that are handling the escrow.

I wasn't talking about users. For example, a publisher pays a node for X bandwidth transmited. Demonstrate that the CA is not cheating by creating users of its own and faking the statistics. Remember than in the original idea, users of the own CA don't count for bandwidth measurement.

Dr-Syn commented 10 years ago

@JavierRSobrino

Why would you want to have nicknames globally available?

If I'm only hanging out in disney.grid, then why should my preferred JafarsOlderBrother handle not be available to people hanging out in persianimmigrants.grid?

Stick to unique identifiers with a freeform 'name' field. Much less hassle.

Seeking to have a human-readable name across the whole net has, up to now, one--1--solution: dns. So in that case, you'd be identified as jafarsolderbrother.disney.grid everywhere, and that's not really amenable to the kind of compartmentalization that this project could take advantage of.

Second, if a CA creates users and keeps grabbing content, then they're paying the whole grid for the privilege of doing so, aren't they? So they're paying to be more prominent. If they want to blow their budget doing that, then what business is it of ours?

JaviLib commented 10 years ago

@Dr-Syn (I'm sorry the other Eric):

But we want to facilitate life to the users, don't we? We should provide a way of globally searching content. We need a form of name resolution to tell users at least the unique name of the publisher. And it is not hard at all: the same program that mine the ID, also mine the name.

Dr-Syn commented 10 years ago

@JavierRSobrino

That's why I want them to be using the various CAs. This means they can have whatever name their CA allows on that profile, and the users don't have to necessarily wait for several days before they can participate--but at the same time, by making the CAs responsible for their users, the likelihood of a bunch of rogue users showing up and causing problems is significantly lessened.

Having the ability to have multiple profiles across CAs also helps the user, as they're able to keep different parts of their online life separate, just as I keep my Dr-Syn stuff separate from my other things.

Also, 'globally searching content' isn't what the original proposal called for: as I recall, the proposal called for the middlemen to act as curators, showing the users a set of content that was appropriate for their milieu.

Not that there's anything stopping someone from building a search engine on top of bitcloud, mind, but that would be just another service, rather than something we have to build in.

JaviLib commented 10 years ago

On 30/01/14 23:40, Eric wrote:

Also, 'globally searching content' isn't what the original proposal called for: as I recall, the proposal called for the middlemen to act as curators, showing the users a set of content that was appropriate for their milieu.

Not exactly:

There is going to be an unmoderated area in which the user can search anything. And there is going to be a users that just search for the content of their publishers. The two options whas in the original proposal. It is even in the white paper that Kyle wrote some time ago.

Not that there's anything stopping someone from building a search engine on top of bitcloud, mind, but that would be just another service, rather than something we have to build in.

We are the search engine because we want to facilitate things to users. The search engine should be integrated in the bitcloud itself, otherwise you force external tools to do very complicated things.

Dr-Syn commented 10 years ago

@JavierRSobrino

What you're saying there is more appropriate for a semi-decentralized web host, rather than a decentralized network of hosts.

You're thinking in far, far too limited a manner. That kind of proposal does not scale much above a few hundred people.

And no, we are -not- the search engine. We are the protocol. Search engines are an application. It is their job to do complicated things. It is our job to put mechanisms in place for them to do those things.

JaviLib commented 10 years ago

What you're saying there is more appropriate for a semi-decentralized web host, rather than a decentralized network of hosts.

But in the actual internet there is NIC registering names, isn't it? Names are a fundamental property of any network, be it local or global. Even more, global names are more needed than local names. We are a decentralized content provider. Didn't you read the white paper? The low level staff is the IP or the meshnetwork that we can base our future versions on.

Also remember that because of the nature of our protocol, we are not using the actual DNS system. We must implement our own.

You're thinking in far, far too limited a manner. That kind of proposal does not scale much above a few hundred people.

Why is that?

And no, we are -not- the search engine. We are the protocol. Search engines are an application. It is their job to do complicated things. It is our job to put mechanisms in place for them to do those things.

That could go to the wetube application, but there is no reason to do so if we can implement in Bitcloud as a commodity. But, why do we do escrow? Isn't that another application?

I think you try to separate too much radically into the OSI layers and are fixed with that in your mind. Look at ethereum, they also do name resolution and storage, and even worse, they also have their own currency. You can't approach this project very well if you are fixed in the idea of the actual status quo defined by OSI.

Dr-Syn commented 10 years ago

@JavierRSobrino

Names are registered in the actual internet in a hierarchical fashion: ICANN issues TLDs to entities who then sell domain names under those TLDs to other entities. When an entity buys one of those TLDs, they can issue as many subdomain names as their little hearts choose.

Ultimately, every single one of those names is traceable up to a central authority: ICANN.

Even worse, right now, you have an internet name that is traceable up to ICANN. It's not going to be a human-friendly name unless you've taken special means for it to be so--it's going to be, essentially, a serial identifier from your ISP underneath one of your ISP's domains.

Trying to take a centralized heierarchy and then mapping it to a decentralized one is problematic; it's best to leave the centralization out of it and go with something that works better with decentralization.

That means that the least problematic way of addressing names is for everyone to have a UUID--not human-friendly--and give them the option of having whatever human-readable name they like on top of it. If you want a DNS-alike, then that can be linked in to the CA packages, as they're already in the right place to handle that kind of work.

We do not want to build search routines into the base protocol because search algorithms change, frequently very rapidly, over time. We do not want to change the rev of the base protocol every time a better search algorithm comes out. It violates the principle of modularity.

Even in operating systems, search functions are not built into the base filesystem itself; they are invoked as separate programs to examine the filesystem in various ways. This allows for different search providers to provide services.

Further, 'search' is a complex issue that is not appropriate for implementing in this context anyway--the scope of that is a whole other project.

Pretty much, if a separate company exists that does the thing that you're considering rolling into the protocol, it shouldn't be rolled in.

Escrow is a somewhat special case. If you look at the bitcloud as a CDN with payment hooks built in, one of those hooks that has to be implemented in order for payments to work is escrow--unless there's a better method of trustlessly conveying funds.

I am not privy to ethereum's methodologies, but I'd be willing to bet that they have several sub-projects under the ethereum banner that handle the disparate parts of their project.

And the OSI model is descriptive, not prescriptive: just because the 'status quo' can be described in terms of that model (badly, mind you) does not mean it's not a suitable model for describing other systems.

cbbcbail commented 10 years ago

I have to agree with @Dr-Syn in this case. His proposal, I think would be better suited than yours @JavierRSobrino in this case.

Just weighing in my opinion as I missed the whole argument.