jbolda / gatsby-source-airtable

MIT License
216 stars 43 forks source link

Don't import hidden fields into queryable data store #272

Open patcon opened 3 years ago

patcon commented 3 years ago

Sometimes fields are hidden because they shouldn't be available to queries. Right now, my impression is that any defined fields are pulled completely into the datastore during build. In my understanding, though my queries may not access them, someone aware of graphql could probe the datastore and find hidden information.

Is my understanding correct?

The expected behaviour would be that when a tableView is specified (and maybe even when not specified), then fields hidden in that view should not be accessible via graphql queries. I would expect the simplest way to ensure this is that they're never imported.

Thanks! I'm open to advice on where in the codebase I should look to figure out how to add this :)

jbolda commented 3 years ago

👋

Looking at #273, it looks like you went the route of "clobbering" anything there. Is that to take care of this issue as well? So this is almost like redaction of sorts, hmm. I wonder if there is another way to approach this. My gut says that this would be better taken care of through the view filters.

Any data available within the view is queried initially and dropped into your local memory >> gatsby's graphql, but the end of the day bundles should only contain the data that you query for within your pages. The issue I see with setting an override is that that would break if someone changes the name of a column, etc., so it isn't fail proof either.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any recent activity for 30 days. It will be closed if no further activity occurs within 7 days. Remove stale label, comment, and/or apply "ongoing issue" label to prevent this from being closed. Thank you for your contributions.

patcon commented 3 years ago

Ah, i missed this. Thanks @jbolda!

I'm new to graphql, but am willing to try to find the better way. Any pointers to the place I'd programmatically intervene in the generated view filters, as you suggested? (Sorry if I'm getting lingo wrong)

patcon commented 3 years ago

sorry, wasn't this suppose to remove "stale" when i commented? maybe the bot has a bug?

jbolda commented 3 years ago

Sorry about that. I've been catching up on things here, and it seems it might have an issue.

Did you end up solving this at all?

patcon commented 3 years ago

Thanks for asking @jbolda! I appreciate that maintaining is a lot of thankless work. I'm actually moved beyond this project, and since this request seems quite niche, I would understand if you wished to close (for the sake of small feelings of success!) :)

michaelsnook commented 3 years ago

Hi 👋 I came here to report the same issue. It seems that the effect of the tableView is just to filter rows, which is useful but perhaps a bit of a mis-nomer.

I'm working on a project right now where we have an airtable edited by ~10 volunteers and a gatsby site. The whole thing is free to run and gave us a full custom CMS and vetting workflow to help drive COVID relief efforts and fundraising all run by a volunteer team. And we could not have made such a great product both for the public site and the back-end volunteer workflows without this plugin so thank you very much for you work.

I don't know enough about how this plugin works to figure out where to start poking around to suggest a change. But I do notice that in the Airtable API documentation for my base they say:

view, string, optional The name or ID of a view in the Campaigns table. If set, only the records in that view will be returned. The records will be sorted according to the order of the view unless the sort parameter is included, which overrides that order. Fields hidden in this view will be returned in the results. To only return a subset of fields, use the fields parameter.

So it seems the issue with expectation vs. reality is really a part of the Airtable API, not on your end. But perhaps you could offer a tableOptions.fields which would allow me to enumerate the fields I want included in my results, and then pass that through to the Airtable query.

I think this is important in part because Airtable CMSs and workflows don't allow the same kind of validation and save-time formatting and calculations that you would expect to ensure quality data from another API or data source. It is working okay so far, but my table is huge with a bunch of functions and calculated fields that are used for different views and workflows, but only about 12 of them are used to build the website and it makes me nervous to see warnings about inconsistently-formed data in my build logs for fields I never intended to pull into the gatsby site at all!

jbolda commented 3 years ago

Oh, that really is an interesting API choice on their part. It doesn't seem like we can query for / fetch information about hidden fields, can we? It would be lovely if we could have this happen automatically based on the view (which clearly was my expectation and likely the same of others). If not, it seems like your proposition is the next best option.

michaelsnook commented 3 years ago

Yes "interesting"! I found there is an airtable metadata API which lists tables, fields, and views. But it only says the view's name and what type (grid, kanban, etc); doesn't seem to tell us which fields are shown/hidden on which views. So, seems like a dead end.

jbolda commented 3 years ago

Huh, unfortunate. Seems this is way.

For a bit of clarity, my understanding behind @patcon's original proposal was being able to adjust single "cells" on the fly. In a single record, it would let you adjust the data in a field. I think the specialization there expects some level of schema adjustment and understanding the graphql layer in gatsby to ensure that. It starts to drift outside of the scope of this package.

Your proposal, @michaelsnook, is patching it to meet the general expectation. If it is hidden in a view, we might expect to not receive the data at all. It feels generally applicable enough (and with the least amount of footguns we can provide) to fit within the scope of this package.

That being said, I think we will want to provide some docs explaining the situation, and add an example that would otherwise fail to build without your change. We build the examples as "tests" to at least provide some guardrails that we don't break existing sites (and that changes in core gatsby don't change and break our expectations). We never really got a better way to test plugins unfortunately 😢 .