thobbs / phpcassa

PHP client library for Apache Cassandra
thobbs.github.com/phpcassa
MIT License
248 stars 78 forks source link

UUID type casting poisons dependent software #126

Closed mcd-php closed 11 years ago

mcd-php commented 11 years ago

I use phpcassa as my core dependency and the need for creating, type-casting and (de)serializing (since PHP arrays cannot have object as key) UUID objects poisons my software with leaky abstraction.

Please, can you write a good guide with examples on this matter - how to use UUID's consistently and not have phpcassa-specific code spill all over my project.

I am thinking hard on either should i wrap phpcassa in some auto-converting layer, or do it in my ActiveRecord classes, or write my own UUID-keyed collection what can give strings or objects on demand - but since no one knows phpcassa better than you, you are the only person to decide on best practices on this matter.

thobbs commented 11 years ago

I'm not sure that I see a problem with using the /phpcassa/UUID class throughout your application if you need access to those UUID functions, methods, and attributes. One nice option is to simply use the string representation of the UUID elsewhere; you can obtain this by casting UUID objects to strings or by referring to the "string" attribute of them.

As far as not being able to use UUIDs as array keys, that was the primary motivation behind supporting alternate data formats in phpcassa. You can find an example that deals specifically with UUIDS here: https://github.com/thobbs/phpcassa/blob/master/examples/alternate_formats.php

mcd-php commented 11 years ago

Here was one problem, i had to fix it: sebgiroux/Cassandra-Cluster-Admin@b60df7698 .

There are more of the like in my job project. Yes, they are my fault, i should be better architect and think up my project more thoroughly, not blindly copying examples into core library. But authors of public libs are bearing the duty of being 'better architects' on behalf of their user-programmers, moving abstract mental work and need for competence upstream and reducing the repetition of them just like reducing "wheel reinvention".

Please, may you take a look at this: http://symfony.com/doc/current/book/from_flat_php_to_symfony2.html (particular framework irrelevant). There are the phpcassa examples and interface on the scale "flat - framework" ? Should they be more like the foundation classes of world-class open source product, than like "flat" counter-example ?

The model, not to say view, should be ignorant of the storage details, auxilliary data types etc.

Is lack of __toString() in UUID class the well-thought decision or an overlook, should i add it, won't it break something ? (I dislike private forks and like to push improvements upstream)

As the person who best knows the phpcassa ( it's structure, usage base etc ) and being supposedly more competent than average "assembly-line" user-programer, please look at it from user-programmer standpoint. What you advice him to weed out storage-specific objects and contain them inside a well-defined border ?

P.S. Alternate formats are definitely worth looking, sadly i didn't notice them upfront, maybe link in tutorial ? P.S.2 Sorry for long thinking and entangled writeup, i was trying so hard to put more cogency in less text still avoiding a personal rant ;)

thobbs commented 11 years ago

Should they be more like the foundation classes of world-class open source product, than like "flat" counter-example ?

Are you just asking for more full-length examples?

Is lack of __toString() in UUID class the well-thought decision or an overlook, should i add it, won't it break something ? (I dislike private forks and like to push improvements upstream)

UUID.__toString() already exists. Perhaps you just overlooked it?

What you advice him to weed out storage-specific objects and contain them inside a well-defined border ?

That depends a lot on exactly what you're trying to abstract. If you only need a unique ID, the string representation of a UUID suffices. If you only need the time portion, just return a timestamp or date. If you need multiple attributes of the UUID object, return a UUID object or an array.

Hack phpcassa\UUID::import() to accept dash-delimited 36-byte string of UUID ?

It should already accept strings like "864fc9ce-da9b-11e2-80c4-e0b9a54a6d93", with or without dashes.

P.S. Alternate formats are definitely worth looking, sadly i didn't notice them upfront, maybe link in tutorial?

Sure, I'll add a note to the tutorial.

P.S.2 Sorry for long thinking and entangled writeup, i was trying so hard to put more cogency in less text still avoiding a personal rant ;)

No problem :)

mcd-php commented 11 years ago

Are you just asking for more full-length examples?

Not for full-length, but full-structure. May you implement storage layer for some well-designed framework ? It can be incomplete and only capable of running example application, but will give some insights on the way.

UUID.__toString() already exists.

Oops, sorry !

If you only need a unique ID, the string representation of a UUID suffices.

Relation algebra insists all fields must be atomic, and there's big wisdom in it. We are non-relational, but i see no reason in composite ID's. Whoever needs time or version of UUID, still may create object explicitly.

So i vote for string keys and no objects in user code.

It ( phpcassa\UUID::import() ) should already accept strings

I will try, but phpcassa in it's entirety does not - many times i had to explicitly create UUID objects from strings in controllers and models to defeat exceptions.

P.S. Did you ever think about implementing doctrine-cassandra after doctrine/mongodb-odm and doctrine/couchdb-odm ? I asked for design manual, got reference for doctrine/KeyValueStore, but it's conceptually insufficient - cassandra is not strict key-value, it has indexed slices, wide rows and super columns.

thobbs commented 11 years ago

Not for full-length, but full-structure. May you implement storage layer for some well-designed framework ? It can be incomplete and only capable of running example application, but will give some insights on the way.

Given that Cassandra is not a general purpose storage system, I feel like that approach can be problematic, especially if the framework presents an object or document oriented API. Cassandra is very much designed to handle the exact type of queries you're interested in efficiently.

Relation algebra insists all fields must be atomic, and there's big wisdom in it. We are non-relational, but i see no reason in composite ID's. Whoever needs time or version of UUID, still may create object explicitly. So i vote for string keys and no objects in user code.

I'm not sure that I understand your point about atomic fields. String representations of UUIDs and UUID objects both contain the exact same information. It's simply a matter of presentation and user friendliness. That leads to my answer for your second point: I feel that UUID objects are more user friendly, especially for the typical use case of TimeUUIDs.

Switching to returning strings would be a backwards-incompatible API break for no gain. phpcassa would still need to support alternate data formats due to php's inability to use floats and non-scalars as map keys.

I will try, but phpcassa in it's entirety does not - many times i had to explicitly create UUID objects from strings in controllers and models to defeat exceptions.

That's a fair point. phpcassa could certainly accept UUID format strings wherever UUIDs are expected. If you want to add support for that to phpcassa\Schema\DataType\UUIDType::pack() and phpcassa\Schema\DataType\TimeUUIDType::pack(), I would accept that pull request.

P.S. Did you ever think about implementing doctrine-cassandra after doctrine/mongodb-odm and doctrine/couchdb-odm ? I asked for design manual, got reference for doctrine/KeyValueStore, but it's conceptually insufficient - cassandra is not strict key-value, it has indexed slices, wide rows and super columns.

I haven't looked into Doctrine integration, but see my first comment about framework support.

thobbs commented 11 years ago

Closing this since #131 will add support for UUID strings.

mcd-php commented 11 years ago

It is not THAT closed even after adding string UUIDs in write direction - since problem will still exist in read direction.

I now look into phpcassa\AbstractColumnFamily for adding another format or flag. What will you comment on this ? What are right places to optionally replace serialize($key|$value) with (string)$key|$value ?

thobbs commented 11 years ago

I would say that type of behavior falls under custom types. pycassa has supported setting custom types per-column as well as for the comparator, key validator, and default column value validator.

To support that, you would need to implement AbstractColumnFamily::set_key_validator($type), set_comparator($type), set_default_validator($type), and set_column_validator($column_name, $type). These would be somewhat similar to AbstractColumnFamily::set_autopack_values().

That's definitely not a trivial change, but that's the right way to do it if you want to support that type of functionality. Once implemented, you would create a custom UUID subclass that unpacks to strings and use that.

mcd-php commented 11 years ago

I have read AbstractColumnFamily quite thoroughly. Your way seems by-column, but I have wide rows for many-many relations. And it looks too complex and deep for me.

Kludged up just for my project, not touching the library itself: https://gist.github.com/mcd-php/6292011

Didn't test deeply, but my application looks not broken at first glance, just removed conversion in my ActiveRecord class and ->string here and where.

Thank you very much for the possibility to override class name in phpcassa\Schema\DataType !

Isn't my solution a subtle error ? Will you incorporate this by some flag or option ?

thobbs commented 11 years ago

I have read AbstractColumnFamily quite thoroughly. Your way seems by-column, but I have wide rows for many-many relations. And it looks too complex and deep for me.

Yeah, there are both per-column and "wide row" conversions going on there. Definitely not trivial.

Didn't test deeply, but my application looks not broken at first glance, just removed conversion in my ActiveRecord class and ->string here and where.

Cool. That's a decent solution.

Thank you very much for the possibility to override class name in phpcassa\Schema\DataType !

I'm glad it was useful. Sorry, I had forgotten all about that or I would have mentioned it earlier!

Isn't my solution a subtle error ? Will you incorporate this by some flag or option?

If I hear from others that they would like something similar, I'll consider making this easier to do.