wp-graphql / wp-graphql

:rocket: GraphQL API for WordPress
https://www.wpgraphql.com
GNU General Public License v3.0
3.66k stars 443 forks source link

Add "JSON" as format option for post content #2161

Closed mboynes closed 1 year ago

mboynes commented 2 years ago

As has been written about extensively, working with Gutenberg content in decoupled sites can be difficult. While a "great" solution may remain on the horizon, I think there's room for a "good" solution now in wp-graphql's core plugin, one which would continue to add value even after something better comes along.

post.content et al have the format options of RAW and RENDERED. When working with Gutenberg-driven content, it can also be helpful to get that content as JSON (the output of parse_blocks()). The wp-graphql-gutenberg plugin adds a field blocksJSON for this which goes a long way!

I think there is sufficient value in porting this into the core plugin by providing an additional format option for Gutenberg-driven fields like post_content. this format would produce a JSON-encoded string from parse_blocks(). This format could be named JSON or perhaps something more specific like PARSED or PARSED_JSON.

jasonbahl commented 2 years ago

@mboynes thanks for bringing up this subject!

I do see the value in adding something like this, even if it's not the "perfect" solution I ultimately want to get to.

Even if we get to the point where Blocks can all be represented as GraphQL Types, I do think there's still utility in this, at least for debugging, etc, as there will be cases where some blocks might not be represented in the Schema as expected, and it would be helpful to quickly access the "raw" JSON to compare with the Typed GraphQL responses.

Response Type

You proposed adding another format option to the content field to go along with RAW and RENDERED.

I like the idea, but I think it probably makes more sense to add an additional field.

My primary reason for this is that I would prefer this to be a custom Scalar instead of a String (which the current content field is, but might change to an HTML Scalar in the future), and GraphQL doesn't support Scalar unions, so we can't have a field that returns a string (or HTML in the future) or CustomScalar.

So I think we would introduce a new field, similar to how the WPGraphQL for Gutenberg extension does it, but I think we could introduce a Gutenberg-specific Scalar.

JSON Scalar?

I'm pretty opposed to adding a generic JSON Scalar to the Type system as I believe that it will get over-used and abused by developers extending the WPGraphQL Schema and will ultimately lead to developers interacting with WPGraphQL having a bad experience, and that will reflect on WPGraphQL more than the fields that were lazily implemented with wildcard JSON scalars.

I think we could potentially add a more explicit Scalar such as EdiorBlocksJSON or something like that, that can ultimately return the JSON produced by parsing Gutenberg blocks, but have some validation to ensure that all objects in the JSON response are valid. For example we can ensure all objects have a block name, etc.

This would allow us to provide what you're looking for without introducing a generic JSON Scalar that I believe would ultimately lead to poor experiences for developers using GraphQL.

That said, if we introduce a field like this, I think we could name it something like editorBlocksJSON? 🤔

We could query like so:

{
  post(id: "...") {
    id
    title
    content
    editorBlocksJSON
  }
}

This could return JSON, but as mentioned above, could do some validation during resolution to ensure each block has expected properties such as block type, etc.

Possible Security Concerns

WordPress doesn't publicly expose the raw content of posts to public users. Same goes with "raw" parsed blocks.

You must be authenticated to see raw content, including raw block input.

WordPress core, WP REST API and WPGraphQL support this access control privilege.

If you execute the following GraphQL query as a public user:

{
  post(id: "..." ) {
    id
    content(format: RAW)
  }
}

The content field would return null. If you executed it as a user with proper capabilities, you would see it. Same goes with the content.raw field returned in REST responses.

This is because the raw data hasn't passed through filters to prepare for rendering to public users.

Raw data can contain potentially sensitive/non-public information, such as IDs to other entities, usernames, emails, api keys, etc. Yes, I've seen shortcodes where you enter things like API keys or credentials to things. While that's probably silly shortcode design, they exist and the raw data is certainly not meant for public eyes, but the result of the shortcode is.

The raw data input into blocks (just like raw shortcodes) isn't always intended for public consumption.

I know at the newspaper when Gutenberg was first becoming a thing, we had a lot of discussions on how we could do some highly contextual things, like editor feedback, etc. Editors could provide authors feedback (google doc style) right there in the blocks. This data could possibly exist as some sort of block meta, but would only be intended to be viewed by logged-in users, not by public consumers of the WPGraphQL API.

So that said, before supporting something like outputting raw blocks as JSON, I think we at minimum need to ensure there are proper constraints in place to ensure we're only returning block data that's intended to be read publicly.

I'm not sure if Gutenberg has any conventions around this type of thing.

I know post meta kind of went through a phase where any field prefixed with an underscore was generally respected as "private" but I'm not sure if there are conventions like that for Gutenberg. My guess is no.

Without being able to ensure that we're not leaking private data, we could limit the parsed blocks to authenticated users (like RAW content is in WPGraphQL and REST), but I think that defeats the utility for (I think the majority of) users that are using Gutenberg to publish publicly-readable content. 🤷🏻‍♂️

Another option is to have the field restricted to authenticated users by default, but allow that to be changed via filters or a setting.

That way, by default we're protecting potentially sensitive information, but allowing a site admin/developer to consciously opt-in to exposing something that's otherwise protected.

Like, today you could write some filters to expose the RAW content to public users of WPGraphQL / REST, but the default is that to keep that data protected.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had recent activity. If you believe this issue is still valid, please open a new issue and mark this as a related issue.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically closed because it has not had recent activity. If you believe this issue is still valid, please open a new issue and mark this as a related issue.

justlevine commented 1 year ago

Thanks to server side block registration, we can generate GraphQL types for blocks instead of needing to return JSON from the schema. E.g. wp-graphql-content-blocks

Reopening this because the use case for debugging might still be justified, so this should only be closed if a decision is made to wontfix.

jasonbahl commented 1 year ago

@mboynes I'm going to close this issue as I believe there are some good solutions for querying Gutenberg Blocks using WPGraphQL.

I'm currently using WPGraphQL Content Blocks with good success. There's some limitations, but in general I think that project is moving in a great direction (something we might consider for core WPGraphQL merge in the future).

If, for some reason, you feel like you NEED a JSON Scalar, you could use the register_graphql_scalar() API and add a JSON Scalar.