pmwkaa / sophia

Modern transactional key-value/row storage library.
http://sophia.systems
Other
1.86k stars 155 forks source link

Sophia or Sphia? #72

Open johncmouser opened 9 years ago

johncmouser commented 9 years ago

The current names are slightly confusing as they are now. Is "sphia" the community's name, while "sophia" is only for the main repo? Why is it sphia.org when the <title> reads "sophia - a modern embedded key-value database"?

I think that it would make sense to start stabilizing on one name while we still can. Maybe move to "sphia" entirely. Or keep "sophia" for everything and move to sophia-db.org.

Not that big of a deal, just a little confusing at times. If you do choose to stabilize, my vote is on keeping everything "sophia".

reqshark commented 9 years ago

i like the shorter sphia name equally.

it's just as easy for me to read with or without the o

pmwkaa commented 9 years ago

Yes, it is Sophia. I've choosen sphia.org because at the moment other domains were taken and i didn't like -db suffix, because it got a feeling like "a yet another one database" for me. But now it seems like better domain name could be appropriate. thanks, i i'll think what i can do)

19h commented 9 years ago

I actually like both Sophia and sphia — in fact, you could use sophiadb.tld and forward to sphia.org ; it's short, memorable and I like it very much. It's actually quite accessible and I didn't ever have to google after the actual website.

reqshark commented 9 years ago

@KenanSulayman +1, yeah my brain's read/write perf on both keys Sophia and sphia is a close call.

but there was a good point made in the other ongoing package management issue, ref: #60.

19h commented 9 years ago

Fair enough. I'd go with sphia then — @pmwkaa already paid for the domain and "sphia" is far more distinctive than sophia. Additionally, sphia allows for much better targeting by search engines.

tl;dr

Google Trends: sophia vs sphia

johncmouser commented 9 years ago

I agree with @KenanSulayman. "sphia" is a much more distinctive name and better for SEO as well.

mknight-tag commented 9 years ago

It is far too early to start worrying about SEO to mangle the name sophia. If the project is good, people will use it and Google will find it. The author is correct in that this is not yet another database. Sophia has incredible potential as a knowledge base. Take a look at the CouchDB architecture to get an idea of what is possible with a document database, except Sophia does away with all of the Erlang infrastructure.

As an aside, the separate front-end project sphia is of very poor quality and we believe the name collision will eventually cause confusion for users of this project. It would be best for the project brand to use the author's name for this project, and not worry about the domain name, that can always be changed later but is just a marketing detail.

Sophia means Goddess of Wisdom - a suggested taglne:

Sophia - You already know.

hardc0d3 commented 9 years ago

Hi, +1 to keep sphia and sp_* Best Regards

pmwkaa commented 9 years ago

thank you, that really inspiring.

mknight-tag commented 9 years ago

Here's some info that may be of interest to you when considering Sophia scalability.

Google recently announced its Cloud BigTable Like most other Google infrastructure, it is based on the JVM. This is what will ultimately keep Google from dominating search on mobile - the JVM is simply too inefficient.

Take a look at this paper that analyzes the power requirements of various implementations of some crypto algorithms:

How much crypto in one microJoule?

The takeaway is that anything running on the JVM is automatically 3 million times slower than a straight C implementation. When scaling to the cloud, this means an efficient C implementation can do on one machine what a JVM-based one will need an entire datacenter to do the equivalent amount of work. And when running on a mobile device, it means it will run much, much longer on the same battery.

Cloud BigTable is advertising 10K queries per second - with a hardware-based hashing function, a Sophia node would eat this for breakfast..

pmwkaa commented 9 years ago

Sounds like true. I'm thinking about adding encryption to Sophia in future (probably AES, use INTEL AES NI extensions when available). Any thoughts?

mknight-tag commented 9 years ago

We are working on hardware-based encryption based on a different algorithm, so we would not be using any AES. We are also developing a concept we call 'bulk randomization', where all data that goes to memory is obfuscated with a hardware randomizer. The advantage of bulk randomization is that your applications do not have to be encryption aware, that is, they are not involved in key management.

The randomization seeds would be associated with an element at a lower layer, so the application would transparently receive the un-obfuscated data. Here is an example image being run through the randomization function:

eia_1956-gray out_rand

This would also be useful as a communication link over which encryption-unaware traffic could be tunneled.

Security-wise, this method is equivalent to a one-time pad, and with a hardware randomizer, scalable to Gigabit speeds.

pmwkaa commented 9 years ago

That sounds interesting, great idea.

Have you considered using ideas behind secret splitting capabilities? This should be in nearest fields: share secrets between N-parts, all copies needed to recover original, etc.

Please correct me if i'm wrong, as i understand to be one-time pad equal (in idealistic way) random source must be a highest level of entropy which is hard task to achieve using any periodic-based algorithms. Otherwise it will be as vulnerable to cryptanalysis as mush as random-generator is. So initially hardware must be initialized with some sort of unique key, likely as stream ciphers does. Is this true?

mknight-tag commented 9 years ago

Yes, we have already developed an RBG, MKRAND with entropy of 1.0 H. It is available in the NixOS package repo.

There is a Cryptol model that you can use to examine and experiment with the mechanism.

The advantage of using this method for database key generation is that you can do time-relative calculations with it, since starting from a known seed, the random keys will unfold deterministically - we call this a timeline. Our current work is converging on a design with three interacting timelines, each with a different quality with respect to time.

If you looked at Google's document, they can't compute time ranges against keys generated with their RBG, which is a big problem for a database. Also since MKRAND produces uniform entropy, there are no hot spots.

Regarding secret-splitting, we have done no work on that yet, as that is more crypto-related.

haneefmubarak commented 9 years ago

@pmwkaa just an FYI - AES-NI has been available fairly widely since 2010-ish.

mknight-tag commented 9 years ago

The problem isn't as straightforward as just using AES-NI - key management then becomes the issue. Presumably the goal is to encrypt the data that is stored in Sophia, but the private keys associated with the content have to be managed as well - an integrated solution demands that Sophia be involved in managing (and protecting) the keys as well.

To expand on the previously mentioned issue of key-splitting, we hold the opinion that security policy should be separate from security mechanism. Splitting keys ultimately is similar to passing out multiple root accounts - the better engineered solution is to solve the encryption issue at the hardware level, and then delegate policy (multi-user, access roles, fine-grained permissions, etc) to a higher layer - ultimately this is the entire reason for the OS. If the operating system has access only to the metadata, and the data itself is encrypted or randomized, then it would be safe to delegate policy to the OS.

Then there is storage.

A database has a more intimate relationship to the underlying storage than most software, even the OS. On mobile and increasingly in the enterprise, Flash memory is where the information will end up. Due to the characteristics of flash memory, there is much behind-the-scenes activity that violates assumptions which turn out to be significant when it comes to security.

A recently released study, Security Analysis of Android Factory Resets” (pdf) indicates that at least a half billion Android devices cannot guarantee data erasure after a factory reset. This includes not only user data, but private keys as well. If Android cannot even reliably erase information during a factory reset, it follows that it has no control over critical information in the normal course of operation, either.

To summarize - there is great opportunity for an embeddable information system like Sophia that plays well with Flash without introducing security vulnerabilities. However, there is hardware development that must proceed in parallel, as the current hardware has some system issues that simply can't be overcome purely at the software layer.

At least a year ago we approached the Linux Foundation for a Critical Infrastructure grant to develop some of these ideas, but we just got signed up to their junk mail. This may be partly due to the political climate in the US against building secure hardware. Dmitry, are you aware of any :ru: Russia-based initiatives that would be more supportive of this type of technology?

haneefmubarak commented 9 years ago

@mknight-tag the critical infrastructure grant is for portions of infrastructure used by literally everyone. Had you been SQLite or MariaDB, they would have definitely considered it.

However, at the current point, Sophia does not appear to have the level of adoption for that. I would hazard to say that perhaps file system level encryption could be used in the interim until Sophia gains a larger level of adoption. This would not work well on shared systems, but it appears to be the best that can be done if funding is required but cannot currently be obtained.

Of course, the above are my presumptions; please do not hesitate in the slightest to correct me.