namecoin / namecoin-core

Namecoin full node + wallet based on the current Bitcoin Core codebase.
https://www.namecoin.org/
MIT License
457 stars 146 forks source link

Value size limit #53

Closed xloem closed 6 years ago

xloem commented 8 years ago

I think this issue must be well known, but I didn't see a bug for it.

Right now I understand values can be created up to 1023 bytes but updated only up to 520 bytes, which means if you create a value larger than 520 bytes you will be unable to keep it !

Perhaps documentation and code should be updated to reflect a hard maximum of 520 bytes for now.

I imagine it is technically possible to expand the bitcoin protocol in such a way that values larger than 520 bytes are allowed.

JeremyRand commented 8 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

My understanding is that the cli and rpc commands for assigning values to names will produce an error if the value is longer than 520 bytes. So the only way you can lose a name due to this is if you're constructing a raw transaction yourself.

I haven't tested this in Namecoin Core. Is my understanding incorrect for either cli or rpc? -----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQIcBAEBCAAGBQJWgXrVAAoJEAHN/EbZ1y06tg0QAIuV2D97GZYlEwNpJXYtM54n HNVhpUOa1SnT2KKrj5D5UL9zSjqwAkiNV28E2BgwntvuWPuCKmktVJPIl3iZQLtS UgCfGRjxQPATaf44y3GBIDkzlo1QMqi0X+nFQQdD0Zqj1cPtAHsyvQJq7GPT+aux xh8PYutlMzpZWuf9LTqsIZePDqw4GOZb1yd2U5XZK9+qJM5xJ6AHoQyHeOKOP62Q XTS1+hTQZRvaimpq9uxzhFoGDERyPlXx5kM/CUoAo07LjKF80D4ZOvNsnhaECSQt PRyBuPM37WPgzkXUAccpKywtXNj4OqI4kVfejI9hSC5bLyIXiAn2pyly2+8uT0wd HpUDadNdOYLeXcipLBRcXlfZax3iFXBiUUceB+aV1aVad2fBoCFIiV521EEpsFKQ zAUkcHCLgRBRDDsL/b38ahydQ90lorpvfpt/bn803GFD9RIemwMmouPD9DIl62FH PAcwq5iaPo8CdIAT7LG+4cra1mOUrtj3aApMpPu3VX4WM2fks9pvK+uEDJYpr+XT TPFRR/zfWE6iNST4HlGs0niF3qo2LfMs3QBp4cIhcdGtpQwZoAxsBVkZyXOQOvcx OcHIMEqe0LRWZk/6lnZNe1ma19P4yG6wOe2cGiAnm670/POmNOYMqPa2ZINCncKL NuNRMf5C6oFr3q3vP2Ap =3Vas -----END PGP SIGNATURE-----

xloem commented 8 years ago

Sorry, you are correct, that is not an issue. Names are properly limited to 520 in the user apis.

The issue is either updating remaining documentation, or enhancing the code to allow larger values to match what is advertised.

JeremyRand commented 8 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

Yes, documentation is lacking at the moment. We intend to switch the namecoin.org site to Jekyll, and move all documentation there, which should make it easier to manage.

Increasing the value size limit might lead to scalability problems as Namecoin's user base grows. At the moment I don't see an urgent need to increase it, but maybe I'm missing a use case. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQIcBAEBCAAGBQJWgZnCAAoJEAHN/EbZ1y06HgoP+wXhLd9kDwDcnMDoQi5BkRvT 67bZyjvX49Tlihes7kNcrmNNi7TtDy2pBeg5LskvZLPnK61Q+xYe8YlPFxznW/6J kUR2N47Z8WB7Rv29c7vHgXx2IiPyy0N32Nnzqx3l/qAsk6Rk6TgCVtuC6+pm1CJM DU4+osDQI3g6nIhzhg1JsMPlIJYW6/6Dp3wNmCGYNCzQvkuGJBEkevRHDszHt9H9 bT9TrTC35DVKFVkfBIZK4zk+nUWcPc8V7/AUELW9O63J4lKjmzfI2raBXc7UkDvo HIFgMoll4vA0pbBYUNLItHIILxZhuayyXYtSxUBLcFpTqwh9Ln6iMaZBrWJ0ZbIt I38Xc5GYMF4C9GQnv0Pe1GO7UsFnynCOUKI/zVogCeWb2dB+nTvXUHscGayoXWtz NzZWbfbERU8pULVrborYyykDR0ahSbLwwwfXyGp68HkLT9qZ74GBXn8iuJbONohr pqE2poXGTx9HZZl3XkW4jRhJ0o65296bBtnEngea+heVhliTKoTRl48JJ1/6L8Vl VGuV0MpW5YXw7m9APpzkbDvFG5etORFN95N4pJuap/825i6MFro6aymOLYI5XBnX DcMAJkudFh2AJGpcqwz1cAQ3ncvCE2adEE1y3Hs/kmXdWWjs92sq/jg/QwAVWVCa LDfeG9NjJhLvTEYzi7gc =Z3Ly -----END PGP SIGNATURE-----

xloem commented 8 years ago

Well, it seems like it would be nice to be able to fit a pgp key in there. But maybe I misunderstand the proper use.

I'm writing a script right now to store hashes of my backups in namecoin, so I can verify later they have not changed. I'm storing the data across four to 8 separate updates right now because there is no room to fit in a human-readable hash and log of commandline args of the couple commands run in just 520 bytes.

domob1812 commented 8 years ago

@JeremyRand already answered the core points. Let me just add that it is almost "trivial" (on a technical level) to increase the limit, which is done, for instance, in Huntercoin (a Namecoin fork). The only difficulty is that this would be a hardfork.

Furthermore, I agree that I don't really see a point in raising the limit right now. Even if we do and allow some more applications at the cost of more chain bloat, there will always be particular applications that are "just prevented" by the current limit (whatever it is). IMHO the correct way for storing keys or securing larger data is to store only the hash (or a single "root hash") and keep the data somewhere else (DHT, cloud storage, key server, backup disk, whatever suits your application the best).

JeremyRand commented 8 years ago

Also worth pointing out that there are scalability improvements proposed (which haven't passed peer review yet) which, if accepted, might make it easier to safely increase the value limit without hurting scalability much. Segregated Name Values is the main one I'm thinking of. In any event, I agree with @domob1812 that no matter what we set the maximum to, there will always be use cases which are broken by it. (Some clown proposed using Namecoin as a filesystem, obviously we're not going to do that.) The same thing is happening for Bitcoin with their block size debate.

xloem commented 8 years ago

If it's ever reasonable to migrate all clients, which I suppose could be done by implementing the feature but not enabling it until everybody's upgraded, it would be efficient to pick a value size which precisely meets but does not exceed a largest supported use case. I think it would be really helpful to make names large enough to include pgp keys. PGP signatures even better.

If some "largest supported use case" were agreed upon for the project, it would basically clear up the stance on the issue and what to expect.

Namecoin is very valuable for storing verification and identifications information because it has nearly as large PoW security as bitcoin due to merged mining. Side storage (DHT etc) can never match such guarantees.

I made https://github.com/xloem/uvfy as an example of this. It makes simple json documents storing a record of a command execution or the hash of a file. But even the simplest documents are too large to store in a single namecoin value.

JeremyRand commented 8 years ago

Since I think it's relevant, I'm going to quote myself from Reddit earlier today, addressing an unrelated suggestion for a use case that used a lot of storage. Sorry if parts lack context.

To elaborate on /u/rya_nc 's point on scalability, when you're considering a Namecoin use case, you should calculate what block size would be needed if your use case became widespread.

Let's do some math. Let's say a name_update transaction for a d/ name that contains ns and tls records is on the order of 1000 bytes. (This won't be very far off.) Let's say that 90% of users only rekey when renewing the name, while 10% of users have a security issue that requires them to rekey twice in a renewal period. And let's say that we have 100,000,000 users (roughly equivalent to ICANN usage levels), with a 30,000 block renewal period (i.e. you have 6000 blocks leeway before your name expires). So, this results in a block size of:

100,000,000 users * (0.9 * 1 + 0.1 * 2) transactions per user per renewal period * 1000 bytes per transaction * 1 renewal period per 30,000 blocks = 3,666,666 bytes per block, i.e. 3.7 MB block size.

I've seen some suggested use cases which end up being many tens of gigabytes. I won't say with certainty that a block size of tens of gigabytes is impossible forever, but it's definitely not going to work with today's technology. Blockstream has reportedly done experiments suggesting that things start breaking around 3 MB on today's network infrastructure, meaning that the current d/ use case (under the back of napkin values I listed above) is pretty close to the upper limit of what's workable.

Basically, the current use cases that we're supporting are already larger than the current Bitcoin block size of 1 MB if they reach widespread usage. They are probably slightly larger than the 3 MB limit that Blockstream research suggests will cause network issues. (So yes, it seems likely to me that Namecoin will have to increase the block size from its current limit eventually.) I would be hesitant to encourage larger use cases than what we support right now unless Bitcoin demonstrates that a larger block size is safe.

One could argue that we could just remove the value limit completely and do what Bitcoin does, i.e. there's a script limit of 10 kilobytes, and people are instead primarily restricted by the block size limit. The problem I see here is that it could encourage a Ponzi-like situation where people buy into Namecoin because they want to use lots of storage, and then once Namecoin becomes more popular all those use cases become prohibitively expensive.

Also, it's worth pointing out that on a technical level, I think it's possible to increase the size limit without changing consensus rules. Doing this would be considered antisocial behavior unless it were widely agreed upon that it was desirable, and I suspect that most miners would refuse to mine such operations.

When you talk about the PoW security of side storage (whether it's a DHT, web server, offline backup, whatever), you're conflating two things: unforgability and uncensorability. Storing a hash in the blockchain gives you identical unforgability to storing the full data. Storing a key fingerprint that signs the side data gives you good unforgability except against replay attacks (though there may be ways around that). Uncensorability is harder to achieve, and certainly a DHT is trivial to censor, but depending on your application you may be able to store the data on a Tor hidden service, or an offline backup, or a public keyserver, etc. You won't be able to match the uncensorability of a blockchain with side storage, but it's worth pointing out that most applications don't actually need the uncensorability of a blockchain. Blockchains are used for consensus applications because the entire consensus system breaks if any of the data is censored for any users of the system, even for a brief period. That's why consensus applications put up with the horrible scalability of blockchains -- because there's no other way to make the system work at all. Very few applications are subject to that issue -- keyserver-like applications definitely aren't. DHT's are among the worst possible solutions in many cases (it's amazing how the Blockstore devs think a DHT will magically solve everything, because it won't), but the best choice is dependent on the application's needs, the threat model, the user base size, the update rate, and the amount of data you're willing to stuff into blocks.

xloem commented 8 years ago

That's a nice analysis. Note that namecoin can already be used for arbitrary storage via successive name updates, so the limit is more convenience than technical.

It sounds to me a fee system properly balanced to reward miners for the actual resources used by the transactions would be the best solution. Ponzi problem could be removed by warning users in advance that large transactions will become astronomically expensive if the technology they invested in takes off.

A third value of blockchains is permanence. Data cannot be lost or forgotten. And it is timestamped so the past cannot be forged. With other storage systems if they are compromised in the future they can be modified to appear to have been that way in the past. I suppose these views are pretty much "uncensorability" and "unforgeability" that you already mentioned though.

I'd say uncensorability is pretty important for a keyserver? If I can hide the a real key from you, I can make reliable encryption with a contact of yours impossible unless you visit them whenever their key changes.

JeremyRand commented 8 years ago

That's a nice analysis. Note that namecoin can already be used for arbitrary storage via successive name updates, so the limit is more convenience than technical.

Yep. You could also do that in Bitcoin using OP_RETURN or multisig outputs. This is successfully discouraged by social pressure and the fact that it's inconvenient. It's not a matter of actively preventing someone from storing a lot of data if they're determined to do so, it's a matter of making it sufficiently inconvenient that most people will find a better way that doesn't cause issues for anyone.

It sounds to me a fee system properly balanced to reward miners for the actual resources used by the transactions would be the best solution.

It's worth considering that miners aren't the only stakeholders here. Every full node has to download and store that data forever, and miners may have more resources than average users who want the security of a full node. Scalability improvements to the underlying system might make this less of a big deal. For example, if a system is proven to be secure where nodes can drop old data if they don't have a need for it, then we can definitely experiment with such market-based systems, because if it goes wrong, the damage is only present for a limited time. Whether such a system is actually secure is debatable, and some people may not want their data disappearing eventually anyway. It's hard to please everyone.

Ponzi problem could be removed by warning users in advance that large transactions will become astronomically expensive if the technology they invested in takes off.

Have you seen how many media outlets advertised Bitcoin as "with no fees" as recently as a year ago? We unfortunately don't have the ability to force the media to behave responsibly, and it never fails to surprise me how many people invest heavily in things without having done any basic research. Being able to accurately blame investors' losses on their lack of basic research skills and say "I told you so" doesn't make up for the bad press that might result (which comes back to media behaving irresponsibly).

A third value of blockchains is permanence. Data cannot be lost or forgotten. And it is timestamped so the past cannot be forged. With other storage systems if they are compromised in the future they can be modified to appear to have been that way in the past. I suppose these views are pretty much "uncensorability" and "unforgeability" that you already mentioned though.

Yes, permanence can be divided into permanent uncensorability and permanent unforgeability. Storing hashes in the blockchain achieves the latter. To paraphrase Greg Maxwell, the use cases for permanently uncensorable storage are unbounded, and it's not feasible to support all of these use cases. We're doing our best to support as much as we can, and I think it's likely that we'll support more use cases in a few years from now than we do at the moment. As much as I'd love the laws of math to change for our convenience, this involves difficult engineering problems with fairly heavy constraints imposed by things we don't have control over (e.g. the way networks behave under heavy load). The good news is that a lot of very smart people are working on this (most of them working on Bitcoin rather than specifically Namecoin, but we get to inherit their work for free).

I'd say uncensorability is pretty important for a keyserver? If I can hide the a real key from you, I can make reliable encryption with a contact of yours impossible unless you visit them whenever their key changes.

Uncensorability isn't really a boolean thing. Imagine a scenario where a keyserver temporary network glitch causes 5% of users to be unable to retrieve 3 people's keys for a single 60 minute period. This is a censorship event, but it's unlikely that anyone would notice at all, and any people who did notice would probably only be slightly inconvenienced. If this happened in a consensus system rather than a keyserver, the result would be a chainfork that would grind 100% of the system's usefulness to a halt for 100% of users, and probably a lot of double-spend fraud would occur. Consensus systems require much better uncensorability than a keyserver system requires. Would it be cool to have a keyserver system that has the same uncensorability that a consensus system has? Sure it would. But it's certainly not a hard requirement for the system to be useful, and doing this has scalability effects that need careful engineering and analysis to be considered a good idea.

samurai321 commented 8 years ago

That's a nice analysis. Note that namecoin can already be used for arbitrary storage via successive name updates, so the limit is more convenience than technical.

This is what i was thinking about the other day reading the nmc forum. I could argue that increasing the data limit to 1kb could actually decrease blockchain bloat. that was specially true when the onename guys where multiple name registrations just to store a simple ID profile.

I am with JeremyRand in that namecoin is not for data storage. No actual data should be stored onchain, in any case, but only a reference of the data in question, like a hash and meta-properties like name, version, date, size, filetype or a helper-url to download the data. And those should fit into a single name.

Nowadays the problem is not downloading some file, there are thousands of ways of publishing data, but confirming that this specific file, is actually the file produced by a trusted entity on a point in time. For example announcing the latest RC namecoin software installer hash verified by the signature of Phelix.

JeremyRand commented 8 years ago

This is what i was thinking about the other day reading the nmc forum. I could argue that increasing the data limit to 1kb could actually decrease blockchain bloat. that was specially true when the onename guys where multiple name registrations just to store a simple ID profile.

Increasing the value size limit to 1 kilobyte would decrease blockchain bloat under a set of assumptions. These assumptions include that (1) many users want to store more than 520 bytes in a value, (2) those users are willing to pay double the name fee in order to decrease their miner tx fee, and (3) those users are willing to go to extra bother to write software that merges multiple values together. Maybe some other assumptions that I'm not thinking of. Assumption (1) is dependent on there not being good ways to look up external data based on authentication data that's under 520 bytes, which we can directly influence by writing software. Assumption (2) is probably false if the user is a miner (since miners can mine their own transactions for free), and is also probably false if the USD value of the name fee increases relative to the miners' cost of processing transactions. Assumption (3) seems false empirically because I don't see many users doing this right now; the "import" field is rarely used, and when it is used it seems to be for the purpose of permission management. (My analysis of assumption (3) is anecdotal, I have no hard evidence of this. Yes, Onename is an exception; they did merge multiple values on a regular basis. That said, they also did a bunch of other hostile/harmful things that no one else seems to be doing, and they're no longer using Namecoin... so it's not clear to me whether they get counted as representative here.)

I am with JeremyRand in that namecoin is not for data storage. No actual data should be stored onchain, in any case, but only a reference of the data in question, like a hash and meta-properties like name, version, date, size, filetype or a helper-url to download the data. And those should fit into a single name.

To be clear, I'm not against data storage in principle, I just think that most of the data storage use cases I've heard are completely impractical with current technology. If Namecoin is still around in a decade, and the practicality arguments have changed by then, then I'm willing to reconsider my position. (So yes, I agree with you assuming current technology.)

There are a few cases of data storage that seem plausibly reasonable to me; these include splitting values using "import" (because that allows selective delegation of permissions to modify a value, which actually is a useful security feature), and storing dehydrated TLS certificates in the blockchain (something on the order of 220 bytes or less; this isn't a lot of data, it's not updated often, it's much more secure against replay attacks than using DS records, and it decreases .bit page load time enough that I think it may be worth it).

I could be persuaded against the above use cases, or in favor of other use cases, by mathematical models of how much storage and bandwidth they will use if widely deployed. 3 MB block size seems to be a commonly cited limit (by people who know what they're talking about) at which things start breaking. SegValues doesn't improve the block size issue, but it does open the possibility of optimizing long-term storage cost.

Nowadays the problem is not downloading some file, there are thousands of ways of publishing data, but confirming that this specific file, is actually the file produced by a trusted entity on a point in time. For example announcing the latest RC namecoin software installer hash verified by the signature of Phelix.

Yes, publishing data is a relatively easy problem to solve, and it's not the focus of Namecoin. For things like software signatures, I think a reasonable solution would be to store a public key hash in the blockchain (which allows verifying signatures), and possibly storing revocation signatures in the blockchain as well (since revocation arguably benefits from the high replication inherent in a blockchain, more so than standard signatures). Revocations are rare enough that I don't see them being a scalability issue (then again, imagine what would happen in the case of Heartbleed, where everyone needs to revoke their keys at once... that actually could be a problem, and I haven't modeled it carefully).

(I definitely enjoy thinking and reasoning about this topic, it's a fun diversion from the nitty-gritty battling against TLS implementation details that I spend most of my Namecoin dev time doing these days. :) Cheers.)

domob1812 commented 6 years ago

Closing as obsolete. Feel free to reopen (or open a new issue) to start a fresh discussion on increasing the value size limit.