Closed molomby closed 3 years ago
Current preference is to focus on "A" but architect it as a background process (#307) that batch reads/formats docs into a file. The user who requested the file could then be notified by email.
The approches outlined in "B" could do well as custom mutations (#309) sitting behind custom elements in the admin UI (#308).
Should we think of excluding some sensitive (privacy related) fields during export? The exclusions could be global or based on role
Yeah, it would run through all the ACL stuff, so users couldn't export data they couldn't read.
The "export value" should probably be something specifically supplied by field type though (and so should form part of the field type interface). There's really no reason anyone should be exporting Password hashes, for example.
Any update on this?
I feel like this one could be a good candidate for a 3rd-party package
for this we would need extension points for adding UI elements in admin-
ui`
I feel like this one could be a good candidate for a 3rd-party package
@MadeByMike, Yeah possibly, it really needs DB adapter support though. Keystone's job here is probably to define the API and let DB adapters implement it or not, any third party export tools can pick up from there.
@gautamsi is also right.
Adding hooks to the Admin UI opens a lot of doors and is becoming a higher priority
It looks like you haven't had a response in over 3 months. Sorry about that! We've flagged this issue for special attention. It wil be manually reviewed by maintainers, not automatically closed. If you have any additional information please leave us a comment. It really helps! Thank you for you contributions. :)
@molomby are we clear enough now to say that bulk data exports is something that should be handled at the database layer and handled by a separate tool: https://github.com/keystonejs/keystone/discussions/299
With that and the adminUI extension points as well as custom queries it should be possible for people to make a basic bulk export feature in the AdminUI.
I would suggest that is enough to close this issue as it unlikely this is going to become a core feature for Keystone.
@MadeByMike writes..
are we clear enough now to say that bulk data exports is something that should be handled at the database layer and handled by a separate tool?
This is a bit of a quagmire because there are a few different dimensions we can solve for:
High-level -vs- low-level -- Is this something that should operate regardless of the DB platform used, eg. via GraphQL, getting all the benefits of access control, read hooks, etc. or should this be implemented directly against the DB for efficiency sake.
Scaleable -vs- naive -- The KS4 implementation of this was let you apply some filters in the admin UI then click a button to download all filtered items (as a csv I think). It was quite handy but also very naively built -- it ran as a long running web request and hurled a single (potentially huge) query at the DB. The lack of any hard limits made it quite easy to DOS your own system with a single click. Fundamentally, bulk actions like this should be async and performed by a background thread or worker process; eg. you'd queue an export and the system would email you or something when it was complete. This is obviously a lot more complicated.
Generic -vs- ad hoc -- For a scaleable solve (as above) the solution needs to work closely with the production infrastructure for things like worker processes and mail gateways. But we don't have any documented patterns or guidance on how this stuff should run so building a decent generic solution (either in core or as a 3rd party extension) isn't really possible yet. The only generic solution practically possible right now would be something simple that doesn't scale.
So what do we do?
I think the only viable solution in the short term is a naive, high-level version; like the KS4 one (but maybe with some hard limits enforced). It wouldn't need any of the complex infrastructure support and seems like a very partial solution to me (it's liable to lead devs in the wrong direction). But it would be undeniably useful on small projects.
So to reframe your question @MadeByMike, would we want this ^^ to be part of core? I'm not sure myself. Maybe not?
The ability to dump data from a Keystone, via the Admin UI (probably the current view). At a technical level there are two broad approches:
A) Pull the data into Node and produce a file for download
This gives us full control of the format (Excel file, CSV, JSON, etc) so is nicest for the user. We can pick columns, ordering and reference hydrated (app generated/virtual) values.
However, without some significant work (generating from a worker, batching items) this approach does not scale well. The prior art (KS4) uses this approach and i's known to have caused outages on large sites.
B) Leverage DB platform functionality
Alternatively, we may want to leverage functionality of the underlying DB directly. Specifically...
mongodump
can be given conditions to filter the documents dumped from a collection. Issues: single collection only, not hydrated, format is difficult to consume, dumps all fields.copy
command can write queries out directly to a CSV. It may also be possible to dump JSON in this way. Less issues thanmongodump
;copy
can query across multiple tables/views, control columns, order, format values, etc. Also it's super fast. Locally I can dump 80k+ records (20Mb) to a CSV in under 200 ms.These tool have a much higher capacity to scale but at the cost of some flexibility (esp. Mongo). This is also something we'd need to work into the adapter framework.