Closed GoogleCodeExporter closed 8 years ago
Storing data structures as serialized strings or JSON-encoded strings (as I'm
doing) is fairly common. Note that this happens ONLY when you are storing data
from a "repeatable" field (e.g. a multi-select storing multiple values). In
version 0.9.5 (almost ready for public release), this same approach is used to
store repeatable image, text, or relation fields. It's used for all fields that
store an array of data (relational data or not).
I thought about the approach a long time, and maybe what I did isn't the best
way, but I kept running into problems with WP's architecture here, so I
consciously chose to avoid it. The reason that I specifically avoided WP's
approach is that it makes it nearly impossible to do certain database queries
-- my approach makes some queries more difficult (as you outlined), but it
becomes much easier to select an *entire post* including all of its custom
fields in one fell swoop. It's really hard to write MySQL queries that join
tables on an indeterminate number of rows. So if you have an array of custom
field values stored as multiple rows in the wp_postmeta table, you checkmate
yourself into requiring a separate database query for EACH custom field
involved, whereas if you have a single row with JSON data, then you can grab
all your custom field data via a single query that grabs the entire post data
and all custom field data via a single query. The approach became solidified
as I developed the Summarize Posts plugin, which offers a more thorough
database querying API than I could find in native WP functions.
The JSON format is actually queriable (whereas PHP serialized data is not).
E.g.
SELECT * FROM wp_postmeta WHERE post_id=123 AND meta_key='my_custom_field' AND
meta_value LIKE '%"7"%';
Note the quotes are included in the LIKE query. Since the wp_postmeta table is
not indexed, MySQL will be forced to do a table scan to find the rows (i.e.
it's slower query) -- so in other words, there's no disadvantage to doing a
"LIKE" query because the table needs to get scanned either way. I'm building
more functionality into the Summarize Posts plugin:
http://code.google.com/p/wordpress-summarize-posts/wiki/get_posts so you
should be able to handle most queries you can think of.
So the other thing that made me go for the JSON approach was that WP offers no
way of sorting the values. When you use the get_post_meta() function to
retrieve an array of values from the wp_postmeta table, you don't have any
control over the order that they come back out of the array, whereas with
approach, you can simply shuffle the input fields as they are stored, save the
form, and back in your templates, the order of the values updates (see this in
action in version 0.9.5).
So yeah... I don't know. I think I made the decision that was the lesser of 2
evils. Maybe it would have been better to stick to WP's approach and just suck
up the fact that I'd need to make multiple database queries, but I've had to do
that for some other projects (unrelated to WP), and the searching, pagination,
etc. just got to be completely untenable if you couldn't grab all your data in
a single query. I'm not the only guy who's used this solution: MODX uses this
for its custom fields, and I've spotted the same approach in parts of Drupal
and Expression Engine, so *somebody* thinks it's viable. haha
If you're really concerned about architecture, you probably wouldn't even be
using WordPress at all. One huge limitation (to me) is that WP offers no
built-in way of standardizing the custom fields, so those relationships are all
stored in the CCTM configuration, and that makes it impossible to know which
custom fields are available for any post or page, and this just makes the
frustrating scenario of requiring one additional database query for each custom
field that much worse.
Original comment by fireproofsocks
on 6 Dec 2011 at 5:19
Thanks for the in depth explanation, it makes sense... I'm just thinking a head
a bit where WordPress community is pushing more into advanced queries that can
be build through the standard API, meta_query etc those methods will become
more powerful overtime so it's nice to conform to the way they see the standard
going.
I agree you lose the sorting which is a bummer, I'm not sure how if the sorting
could be done just by insert ID... but if you look this long term, where people
use a bigger query such as
$args = array(
'post_type' => 'product',
'meta_query' => array(
'relation' => 'OR',
array(
'key' => 'color',
'value' => 'blue',
'compare' => 'NOT LIKE'
),
array(
'key' => 'price',
'value' => array( 20, 100 ),
'type' => 'numeric',
'compare' => 'BETWEEN'
)
)
);
to non mysql people and even more advanced users it's just easier to use the
API, where as if you have to write a new query every time you want to retrive
the above data along with some additional like filtering for your plugin that's
where it becomes more tedious to write those queries...
Original comment by b...@webcultured.com
on 6 Dec 2011 at 5:32
BTW is there a function in the plugin that will pull related posts based on
post id and meta key? That would be great, so you eliminate the need to build a
query... rather we just say
get_related_posts($post_id,'key_name')
and we get back a standard wp_query object with those related posts....
Original comment by b...@webcultured.com
on 6 Dec 2011 at 5:37
I understand what you're saying: it's usually far better to follow the norms of
any application, but in the case of WP, I feel it's fair to deviate from that.
There are so many poorly implemented features in there, and the architecture in
my opinion is on its last legs. It's just a mess under the hood and if there
isn't complete rewrite of WP in its entirety in the next few years, I would
guess that its market share will slowly drain to systems that are easier to
develop and maintain (just my predictions... time will tell).
Your sample query is a fair point: you'd have to do some of your filtering in
PHP if you were filtering on the JSON encoded fields.
No, I don't have a function to get all related posts directly, but you could
pass values to an instance of GetPostsQuery. E.g. imagine a "repeatable" field
that contains an array of values:
$post_ids = get_custom_field('my_relations:to_array');
$Q = new GetPostQuery();
$args['include'] = $post_ids;
$results = $Q->get_posts($args);
foreach ($results as $r) {
// $r will contain the entire post including all custom fields in an associative array.
}
Original comment by fireproofsocks
on 6 Dec 2011 at 6:09
Actually, I should write a "get_post" output filter that retrieves the entire
post object so you could do something like this (not yet implemented):
$posts = get_custom_field('my_relations:to_array', 'get_post');
Or maybe
$posts = get_custom_field('my_relations:to_array:get_post');
Still thinking through the api here, but that's a good feature request.
Original comment by fireproofsocks
on 6 Dec 2011 at 6:14
Sounds good thanks, and yes lets hope they clean things up :)
Original comment by b...@webcultured.com
on 6 Dec 2011 at 6:14
Perfect, yeah having something like that in place would be helpful.
Original comment by b...@webcultured.com
on 6 Dec 2011 at 6:16
Feature request: issue 251
Yeah, this is good conversation to have -- honestly, I wish I had had more
input for other devs as I worked on this. That's one thing I miss from working
on a dev team.
Thinking more broadly, I think it's ok to abstract the database a bit. Some
MVC purists argue that the developers shouldn't care how the data stored in the
model, so all database access should come ONLY through the API. E.g. MODX is a
"database agnostic" platform -- that's pretty amazing that for most dev
scenarios, you never write a single SQL query *ever*. I don't see any
advantage to taking WP to that extreme, but I think the Summarize Posts plugin
and the GetPostsQuery class will allow more flexible searching over time and
hopefully nobody will need to see the exact internals of how those custom field
arrays are stored. Anyway, it's an interesting conversation, and I suppose
there's never a perfect solution.
Original comment by fireproofsocks
on 6 Dec 2011 at 6:27
And while I'm thinking about this (sorry I'm so OCD this morning), WP does this
sort of thing with its image meta data: WP serializes a bunch of stuff (height,
width, etc.) in a hidden custom field named _wp_attachment_metadata. It's
serialized data, so you can't search it at all, but that's much more convenient
to have all your meta data in one spot rather than in umpteen different rows.
I guess it all comes down to searching: if you need to do complex searching,
then you really need a database structure that supports it. I think that's the
approach that PodCMS takes: http://podscms.org/ -- it avoids this problem by
adding a custom table for each post-type, but that in turn introduces other
searching difficulties (see Drupal here): it's hard to search across multiple
post-types.
Original comment by fireproofsocks
on 6 Dec 2011 at 6:32
Lol no worries, yeah it's all about sacrifices and compromises... I know that
some of that image data is stored that way... but you typically wouldn't search
images that often in that way... however with relational data it's usually
there to be displayed or used in some way during rendering... :)
I'm sure we could go back and forth on this for ages but in the end not a huge
deal just have to deal with it the way you intended :)
Original comment by b...@webcultured.com
on 6 Dec 2011 at 6:40
I've pasted this into a wiki page:
http://code.google.com/p/wordpress-custom-content-type-manager/wiki/CustomFields
DataStructure
It needs some cleanup, but it's worth documenting in the wiki for future
reference.
Original comment by ever...@fireproofsocks.com
on 20 Dec 2011 at 5:42
Original comment by ever...@fireproofsocks.com
on 20 Dec 2011 at 5:42
Original issue reported on code.google.com by
b...@webcultured.com
on 6 Dec 2011 at 4:55