hivejs / hive

:rocket: A real-time collaboration platform :zap:
http://hivejs.org
Other
50 stars 6 forks source link

don't use binary data type #108

Closed marcelklehr closed 8 years ago

marcelklehr commented 8 years ago

Use strings with size

MrTin commented 8 years ago

Using strings seems more reasonable!

What do you want to accomplish with a limit?

I don't think you will gain anything by setting a limit on the column, only reason I would see is if you need a limit. I'm pretty sure the allocation of space is efficient and self-optimized by the db engine, at least in Postgres and newer versions of Mysql.

From http://stackoverflow.com/questions/1962310/importance-of-varchar-length-in-mysql-table

1

No, in the sense that if the values you're storing in that column are always (say) less than 50 characters, declaring the column as varchar(50) or varchar(200) has the same performance.

2

There's one possible performance impact: in MySQL, temporary tables and MEMORY tables store a VARCHAR column as a fixed-length column, padded out to its maximum length. If you design VARCHAR columns much larger than the greatest size you need, you will consume more memory than you have to. This affects cache efficiency, sorting speed, etc.

3

I've also seem some JDBC drivers that allocate enough space for the maximum size when setting up buffers to retrieve rows. Needless to say, this causes much angst and gnashing of teeth when some clown has just done varchar(50000) just in case someone has a really large last name :-)

Both cases, seem non-applicable to what we are doing here, right?

marcelklehr commented 8 years ago

Interesting, I was under the impression the database beta.hivejs.org runs on was cutting off data from documents when they became too large. I considered setting a size to manually increase that upper limit. Might be worth trying it again. Thank you for the research!

marcelklehr commented 8 years ago

Ah. See http://www.electrictoolbox.com/maximum-length-mysql-text-field-types/

MrTin commented 8 years ago

Seven years old though :) MyISAM has pretty much been replaced with InnoDB as an engine for example, and InnoDB should be default.

What is the requirement though? Is each document holding all changes or just one edit and it's full document? How much can one expect the document and changes columns to grow into?

I'm not sure I fully understand how the underlying models work and how you persist each edit (if that's what you do) :-)

marcelklehr commented 8 years ago

Oh, you're right. These pages list the same constraint, although they're also not exactly young: https://stackoverflow.com/questions/13932750/tinytext-text-mediumtext-and-longtext-maximum-storage-sizes and https://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html but in the latter that might apply to NDB only. It's probably easiest, if I try it out myself.

Yes, edits are persisted as snapshots. A snapshot holds the entire document contents at the point in time when the snapshot was made, plus the changes that lead to this particular snapshot. So, MEDIUMTEXT's 16M should be fine. PostgreSQL doesn't seem to have a constraint on TEXT size.

marcelklehr commented 8 years ago

MariaDB's XtraDB also handles 1.5M inside a TEXT column just fine.

MrTin commented 8 years ago

Cool 👍 🎉