Open jthomasmock opened 11 months ago
I agree that exploring S3 data is exceptionally useful, but in my view the Connections pane is the wrong place to tackle this; it is already teetering on the edge of being too generic. We can build a better Connections experience if we scope it to database-like connections, which S3 isn't.
For S3 (and many other hierarchical data stores) it would be possible to build an extension that integrates nicely with Positron using the Tree View API. https://code.visualstudio.com/api/extension-guides/tree-view This would make the S3 explorer a peer of the Explorer view.
This is the approach taken by the existing S3 extensions:
(from https://marketplace.visualstudio.com/items?itemName=NecatiARSLAN.aws-s3-vscode-extension)
Thanks Jonathan, that makes sense that you want to keep Connections to DB connections.
Given the extensibility, does it make more sense to let the external developers (AWS/Azure/GCP) manage their own "VS Code blob storage" interface?
I see at least AWS and Google extensions, and it looks like MS also has an extension, but in preview. I think that all of our planned auth passthrough will "just work" with the extensions, but we'll want to validate that on our end in Workbench as we build.
Moving this out to Future, as Jonathan has mentioned there are existing VS Code tools for viewing object storage such as S3 and the Connections pane is scoped for Database-like connections.
Tareef had some good feedback for pins
in Slack that I'll capture here but that is likely a separate VS Code extension and not a core part of Positron.
I have started playing with Positron and pins. A couple of suggestions that came up that I thought I would share:
It would be good if pin_list returned the pin description along with the pin name.
It would be so nice to get a little description of what to expect for this pin when you are inside the data viewer.
It would be great if the “type” of the pin was also included in that list.
This became an issue for me in Python because we unfortunately have a bunch of pins on Connect that are rds, which means I can’t load them up in Python. It would have been nice to know that I was out of luck before.
Copying over a feature request from rstudio: https://github.com/rstudio/rstudio/issues/13044
It's largely tied to the RStudio connections pane concepts, but I think it's valid for Positron/Workbench, especially as we have spent a lot of time on auth passthrough for AWS/Databricks/Azure that have remote hierarchical data stores.
In 2017, the Connections Pane became significantly more flexible, allowing packages to surface arbitrary objects and icons via the "Connections Contract". However, so far there are few users of this powerful interface (outside of the odbc and sparklyr packages), and so the majority of users think about it as being a gateway to databases for novice users.
At the same time, the most common source of data after files and databases is almost certainly object storage (e.g. S3). There are existing R packages that expose S3 and comparable services already, but all are focused on programmatic access, and none of them offer much in the way of a UI that can appeal to more novice users.
Although object storage is emphatically not a filesystem (and so it does not belong in the Files Pane), it does have a hierarchical structure similar to the catalog/schema/table tree used in the Connections Pane. This makes me think that making sure that the IDE and the Connections Contract work well with object storage idioms (buckets, blobs, objects) is a valuable investment.
Moreover, I've heard numerous accounts from customers that onboarding new users to use cloud-based tools like S3 is difficult because they are simply more comfortable navigating a mounted filesystem in the File Viewer (or in, say, Windows Explorer). This is a huge sticking point for otherwise painful technologies like Kerberos.
My attempts to use the Connections Pane for S3 ran into three minor limitations, all of which could be improved without too much effort:
There are builtin icons for catalogs, schemas, tables, and views, but none for object storage idioms like "bucket".
The "preview" action opens the Data Viewer to show up to 1,000 rows -- this doesn't really make sense for object storage. We should allow implementations to return NULL rather than a data frame, in which case the Data Viewer won't open. Implementations could then trigger other actions instead (for example, downloading an object to the local filesystem).
Some UI elements are still too tied to databases. The tooltip for the preview "button" (which is the icon) says View table (up to 1,000 rows) and the pane shows (No tables) when the implementation returns no entries. Ideally, we could expand the Connections Contract to allow implementations to provide their own text for both of these cases.
Example screenshot, to show what's possible: