Open CamilleLegeron opened 1 year ago
Hi @CamilleLegeron, Grist does have a space in the left panel for "examples and templates" that can be set up - here it is in our hosted service:
Superficially, that sounds related to what you are asking for, but I suspect you want the reference data to be available within documents, for use in Reference or Reference List columns, is that correct?
Yes, what we would like is "reference data" maintained centrally and made available across a whole Grist site (so that, for example, when french list of cities will change (which it does every year)) we only need to change in one location and all documents referring to this data can benefit from the update.
(PS: this is linked to the email i sent you and Dmitry earlier today)
I understand now, thanks.
One feature we have considered is table replication, where a table in one document is copied to many other documents, and then kept up to date. The nice feature about replication is that the documents remain complete databases with clear referential integrity. The not-so-nice feature is the duplication of data. This is a very general purpose feature that could find many uses, including the site-level master data use case but not limited to it.
There are other possibilities. Let me ask you, if a document from a site with site level reference data is downloaded, and opened in another installation of Grist, what should happen? What should actually be stored in the document?
I could imagine doing the following as a thought experiment.
What do you think @dsagal ?
One question about requirements: does the shared data sometimes need to be accessed by formulas? E.g. if Cities
is a shared table, and another table in the current doc has a reference to a city, would it be important to be able to access e.g. $city. population
(as it is for regular tables)? If not, i.e. if only showing data in UI is important, it opens up more options, I think.
@dsagal
One question about requirements: does the shared data sometimes need to be accessed by formulas?
I can see myself wanting to access shared data via a formula. $city.population
very much looks like one of our use cases for shared data.
@paulfitz
There are other possibilities. Let me ask you, if a document from a site with site level reference data is downloaded, and opened in another installation of Grist, what should happen? What should actually be stored in the document?
I think I would be OK with shared site data being lost when a document is re-uploaded outside of the site. Maybe an option when downloading a .grist
that references shared data would be to write this data not as a reference but as plain data ?
We haven't really came across the use case of migrating a doc from one site to another yet, so I can't really wrap my head around whether this behavior would break stuff.
Meshing data and metadata between sites sounds like a very tricky (albeit interesting!) engineering problem that we might want to tackle as a follow-up step to a site-level-only implementation, wdyt ?
It seems tricky to avoid duplication while keeping the other features (notably reference, formula etc). In any case today it's data that is duplicated in each document. Maybe forbid the "attachment" type to limit the size of these documents? If the tables of the document are duplicated there is still the problem of synchronization that arises, no ? We should forbid the edition in the duplicated table and trigger a synchronization at each modification of the meta document When exporting the .grist if the data has been duplicated the problem of loss of information does not arise.
It seems tricky to avoid duplication while keeping the other features (notably reference, formula etc). In any case today it's data that is duplicated in each document. Maybe forbid the "attachment" type to limit the size of these documents? If the tables of the document are duplicated there is still the problem of synchronization that arises, no ? We should forbid the edition in the duplicated table and trigger a synchronization at each modification of the meta document When exporting the .grist if the data has been duplicated the problem of loss of information does not arise.
@dsagal and @paulfitz any idea or opinion to advance on this subject?
I agree that duplicating data, treating it as read-only within a "subscribing" document, and establishing a method for keeping it up to date is overall simpler than many of the alternatives - and meets the need you have.
It relates somewhat to incremental imports. In an ideal world, Grist could keep part of a document up to date with information imported (and re-imported, perhaps automatically) from some outside source (such as a URL). Viewed that way, your feature is "just" a special case of that.
What if the existing import mechanism were improved in the following ways:
It isn't quite the same idea, but it could push Grist in a nice direction, rather than being more of an unintegrated feature. The downside is it wouldn't be as smooth to begin with as something very special-purpose could be. Imports are currently somewhat clunky.
hi @paulfitz !
Bettering the import experience is definitely very high on our list of nice-to-haves, and the steps you outline make a lot of sense to me !
One feature that we would like to add to the plan you outline is the option to import several linked tables from another Grist doc on the site (in the main use case we have in mind for this feature, we would like to keep all docs up-to-date with cities
, metropolitan_areas
, and regions
tables with many reference columns linking them together)
Again, I'm wondering whether this feature is too "core" for us to work on, and whether, if that's the case, Grist would be interested in us financing these developments by some devs from your team ?
We can start dipping our toes in the import code and let you know how it goes, would a good first step be to tackle the import from another Grist doc on the site
? How would permissions play into this ? My gut feeling is that only an admin of both docs (or maybe only a site admin to start with) should be able to perform this operation.
Hi @vviers! I agree with the concern that this feature may be too "core", in the sense that it would be more efficient if we worked on it directly. In particular, the import code is complicated and may require refactoring as part of this project, and the related aspects would be connecting several different areas of the product. Let me connect with you directly to discuss options.
Hi there!
I am planing on implementing that feature.
In order to ease the development and also in order to leave control over the data being fetched, we would go to the solution proposed by @paulfitz with an action from the user required to refresh the data.
NB: this is my proposal after discussing about this feature. @vviers and @CamilleLegeron may also give feedback about this.
Regarding the user journey to create a data source synchronized from an external document (aka synchronized data source), I would see this:
The user clicks « Add new » → « Add page » and is offered this view:
(at least with more cosmetic :sweat_smile:)
They are offered to select a document from their own instance.
About this idea of entering a table URL, that should lead to the same instance or to a public document, I am not sure how difficult it would be to implement that, I would suggest to implement it later if we want to.
Once the user submits, the data are fetched from the external document and put in a new table and a new data source.
Then the user can see the table with synchronized data source:
And the synchronized data source itself:
Both data source list and table list would offer another context menu to synchronize the data source:
And I would suggest that also a click to the 🔁 icon would trigger that action.
When they trigger this action, the user is offered a popup to review the changes and validate if they want to:
To be done
To be done
Looks promising to me at first glance ! :)
The user clicks « Add new » → « Add page » and is offered this view:
I am unsure whether External Table
should be offered as a new type of widget alongside Table
. I think conceptually the option of picking an external data source should pop up in the "Select Data" menu once you've selected a widget type maybe ?
On the other hand, making External Table
a special type of Widget would allow for implementing a more dedicated interface in the options panel to the right, which might be what we want 🤔
Could possibly have an option to break/forget the link to an external document, in which case the feature could help with https://community.getgrist.com/t/move-table-to-another-document/2888/8
Could possibly have an option to break/forget the link to an external document, in which case the feature could help with https://community.getgrist.com/t/move-table-to-another-document/2888/8
Sure break / forget the link is an interesting feature!
I am unsure whether External Table should be offered as a new type of widget alongside Table. I think conceptually the option of picking an external data source should pop up in the "Select Data" menu once you've selected a widget type maybe ?
That's true! ~~However, regarding the UI, I feel hesitant with this option, as it would suppose adding a third panel to the right which would appear when we select "External table". That might be somewhat OK, IDK, I would love to have the feedback from a UX/UI expert about this.~~
Edit: actually there is a third panel for filtering / sorting data. I will scratch my head a bit more to see if I can make a good alternative proposal based on your remark.
On the other hand, making External Table a special type of Widget would allow for implementing a more dedicated interface in the options panel to the right, which might be what we want thinking
Not really a widget, IMHO. Also a widget is a special view rendering data from a data source. In our case, it's rather a special data source. Which leads to make consider the below interrogation :sweat_smile:
Then the user can see the table with synchronized data source:
Reconsidering this option I proposed, I am less sure. This is a list whose items are composed of 1 or several views, whose source can be various. Or it may hint that this item contains at least a view linked to a synchronized data source? I feel like it could possibly make sense.
There's another place where the external table could be brought in - with the other import options:
Those import options lead to an import process that can either create a new table or update an existing table. @fflorent what if we supported
Import from another Grist document
along with the other import options? Even without the icon for later one-click sync, that would be immediately useful. And, when the one-click-sync feature exists, it would be great if it could also apply to imports like Import from URL
.
A downside is that Grist could potentially do a much higher fidelity import from another document, with e.g. conditional formatting and the like, but that might be hard to organize within the generic import mechanism (@georgegevoian might know more about the possibilities here).
Those import options lead to an import process that can either create a new table or update an existing table. @fflorent what if we supported Import from another Grist document along with the other import options? Even without the icon for later one-click sync, that would be immediately useful. And, when the one-click-sync feature exists, it would be great if it could also apply to imports like Import from URL.
I like the idea!
but that might be hard to organize within the generic import mechanism
When you mean that, do you mean technically or regarding the UI?
but that might be hard to organize within the generic import mechanism
When you mean that, do you mean technically or regarding the UI?
I mean technically - but I could be wrong, I haven't looked closely. For the UI, I'd guess that having things "just work" would be the expectation (all formatting and options that are practical to copy/update would be copied/updated), so no particular controls needed.
I created a pull-request in my own repository with some changes to the initial proposal: https://github.com/fflorent/grist-issues-proposals/pull/1
With the hope it will be more convenient to review / comment it.
Feedback highly welcome! :pray:
Personally I was agreeing with the discussion here (which I'm only reading now, sorry) until it pivoted to using imports. Imports are complicated, both in the backend and the frontend. I imagine most users who want this kind of feature would much prefer to have the data stay up to date automatically (even instantly) rather than the flexibility of imports.
Here's what I think is required for syncing, and it doesn't sound too bad:
2. Doc actions on the table/column metadata tables which affect the source table are broadcasted in some special way where the refs are translated to string IDs.
In principle, as a first pass, you could skip steps 1 and 2, right? And have non-live syncing, where a sync could be triggered by reopening the document (or perhaps instead clicking a button).
Looking at https://github.com/fflorent/grist-issues-proposals/blob/bcc569b8def1063c19c7bd4c466ed38490df9572/issue-416/design.md I see the mapping parts of the import UI skipped over, which seems reasonable.
The part for showing diffs prior to import does line up with a part of the existing import mechanism that has an ugly implementation currently. It could perhaps also be skipped on a first pass?
Not sure if the options are very different on an unambitious first pass, but yes by nudging towards imports I am hoping for some fairly general purpose improvements to incremental imports in a second pass. It could be wrong headed, since Grist documents are indeed a special case.
There's a complete other direction to push this, where the inserted table is seen as the result of some workflow, like any workflow that generates a table (e.g. a bit of pandas code), and we double down on workflows. Pretend I never mentioned that, to cut down on the options :-).
Related wishes Let me share something related from another customer who has built up sophisticated processes in Grist. They want to clone parts of documents, for several purposes (they already do it with external scripts, but it's a lot of work):
Another thing they want is something like "materialized views" like in Postgres, which become a Grist table, again with an option to sync (like REFRESH in Postgres). This is very much like @paulfitz's note about syncing the result of a workflow that produces a table.
In regard to frequency of updates, they want a way to sync manually, or on a schedule. It is less common for them to want to sync automatically because the use case is primarily in having one frequently-updated document with raw data, and smaller documents that are derived from it for reporting, which should not change unpredictably.
Common Layer To tie this to the discussion above: Grist's imports are certainly related but I agree with @alexmojaki that Grist's import mechanism is too complicated, partly to support interactions not necessary here, partly because of its messy history. Rather than building syncing on top of existing imports, I think we need a simpler common layer that would be useful for both syncing AND imports.
What I mean by common layer is something close to the add-or-update REST API. Imports already use somewhat similar patterns for _mergeAndFinishImport and _getTableComparison, but they mix in column mappings and transforms, making it hard to reuse that code.
What I'd hope for is a common layer that takes a source table, a compatible destination table (contains at least all columns of source table), and a config (specifying which columns identify records, and a few flags like for add-or-update), and can produce a diff and an update to the destination. The hard parts to work out will be things like handling column types and options, and rendering diffs/previews. I think it's important to come up with a simple layer that can be stable, and efficient, and easy to build on top of.
Then other functionality can use this layer: syncing Grist tables fits it naturally; imports build on top of it with column mappings, transforms, and live revisions (i.e. we could refactor imports some day to use it); and tables derived from workflows (e.g. pandas output or SQL queries) could use the same layer to get materialized as Grist tables.
Something (potentially important, or maybe not?) that isn't handled by imports or add-or-update is records being deleted from the source table.
I made a POC (using the API and what I developed in PR #605). Here is what I currently get in this video:
https://github.com/gristlabs/grist-core/assets/371705/cf9bfb9b-082b-43bc-9042-64a2e4f45b83
Here is the code (dirty as it's in a POC state): https://github.com/fflorent/poc-grist-codegen/blob/ff01f3d6e4ae0b27e2a215e149c9529cb454f38a/synchronize.ts
In terms of AC (if you are not comfortable with some wording, take a look at my glossary and/or don't hesitate to suggest changes):
Here is what the POC does not do yet and will unlikely be hard to implement:
Here is what the POC does not do yet and needs more work and thoughts:
uuid
for each record (either with or without user intervention)?Regarding the deletion of records (AC2.2) + AC 3.2, what is currently missing is the IDs of the records added or updated. I will open an issue to suggest some changes to useractions.py
and the API for that.
Also I don't intent to implement all the diff-related features (AC 2.5, 2.6, 3.4, 3.5) yet as per Paul suggested
[@fflorent those are intriguing commits referencing this issue up there that github is snitching on you about :-)]
@paulfitz Yes, don't worry, we will probably talk orally a bit more about this feature next week, so we can frame this feature :).
BTW, have you taken a look at the video from my last comment (9th of august)?
Following our discussions last week, here are some notes:
Due to the misunderstanding, I propose to differentiate 2 steps related to the synchronization:
A
=> B
=> C
=> A
=> ... CRASH!)
It appeared to us that manual synchronization would be simpler:
Also (IMHO) I would say that it would probably offer a better user experience:
FWIW I think only users with Edit rights and above should be able to trigger a synchronisation on a table.
Actually, I think I'd feel even better if only users with structure edit rights could trigger synchronisations.
The synchronization should work across different Grist instances
This no longer fits the title of the issue :laughing:
I think it's pretty clear that we should go with 'manual' synchronization for now. In addition to your reasons, I find it harder to imagine implementing instant syncing across instances. This makes me agree with the earlier view to frame this as several smaller improvements to imports, starting with this comment from @paulfitz:
What if the existing import mechanism were improved in the following ways:
- An option was added to import from another Grist doc ~on the site~
- An option was added to save import settings in a way whereby they could be repeated incrementally with one click [or eventually automatically on a regular schedule]
- A setting was added whereby, if a doc has a one-click import from another doc, and the data in the other doc was detected as freshly changed, the import would rerun either automatically or by prompting an owner.
Other features to add:
Hi Folks! I would like to add my case:
I have different projects (and students) that have to use a huge master table, that is a list of plant species in Brazil. With synonyms, it has more than 150K lines. Every specie (line) has a primary key (taxonID).
Therefore, will be great if I don’t have to replicate this table to every document, and better, to every student. In my wonderful and perfect world, I would like to reference columns between different documents as a global “document.table.column”, and better, to have “public” tables, and reference it as @eduardo dalcin.document.table.column in my students account.
Hi!
In many cases we need reference data that are the same in all
Workspaces
of a sameSite
. For the moment we duplicate each time that reference data in each Workspace, so we have to maintain this master data everywhere. An exemple : the list of French cities. We need it in a lot of Workspaces and it's everytime the same list.It would be great to have a space in the left panel with these master data directly in the root Site.
We imagine that it must not be an easy feature, but it would be very useful for us
(Maybe there is already an issue that we can't find)
@vviers
Camille