add json(s), Excel and arrow data formats support

microsoft / SandDance

Visually explore, understand, and present your data.

https://microsoft.github.io/SandDance

MIT License

6.4k stars 525 forks source link

add json(s), Excel and arrow data formats support #154

Open RandomFractals opened 4 years ago

RandomFractals commented 4 years ago

see Data Preview 🈸 vscode extension for example of how to integrate those data formats: https://dev.to/tarasnovak/vscode-data-preview-for-devs-around-the-39mn

You can use or peruse my custom Data Manager API & src/data.providers folder for data loading and saving implementation details to enrich SandDance with more data source type choices ...

gramster commented 4 years ago

I am surprised this only had one vote and it was a downvote, at least for the Arrow support, as that would clearly be a good way to share data from Jupyter implementations.

danmarshall commented 4 years ago

@gramster - is your Arrow data stored as a file? Wondering if there was some other mechanism from Jupyter you had in mind.

RandomFractals commented 4 years ago

ha! I forgot about logging this.

@danmarshall would be nice to have both, from file & pipe as described in #213

danmarshall commented 4 years ago

@RandomFractals - does your extension support piping?

RandomFractals commented 4 years ago

no, but you can call it with data file uri to open data preview similar to how I suggested you integrate vega viewer with SandDance in #153

so, you'd just call it with:

commands.executeCommand('data.preview', dataFileUri)

RandomFractals commented 4 years ago

and you can check if data preview is installed via get commands:

// execute requested data preview command
    let viewDataCommand: string = 'vscode.open'; // default
    commands.getCommands().then(availableCommands => {
      if (availableCommands.includes(this.dataPreviewCommand)) {
        viewDataCommand = this.dataPreviewCommand;
      }
      commands.executeCommand(viewDataCommand, dataUri);
    });

see how I do it in vega viewer: https://github.com/RandomFractals/vscode-vega-viewer/blob/master/src/vega.preview.ts#L279

gramster commented 4 years ago

I was thinking of data in the Plasma object store. We had an intern prototype viewing dataframes from the Jupyter notebook in VS Code in SandDance, but that involved (IIRC) serializing the data as CSV and passing it in a URL, which clearly won't scale well. I'm wondering what we could do for large datasets (obviously writing to a file on disk is an option too, and maybe that's all we really need).

RandomFractals commented 4 years ago

yeah, I think to have it scale, writing to disk in raw arrow data format, rather than CSV might be a better option and than have SandDance or some other extension load a user friendly data frame/grid view.

would be nice if vscode had some IPC api for extension integrations and sharing data in memory and arrow is perfect for it. I just don't think we have a vscode api for that yet.

RandomFractals commented 4 years ago

@danmarshall have you looked into this yet?

danmarshall commented 4 years ago

@RandomFractals no I haven't.

gramster commented 4 years ago

Sorry, I missed this. We aren’t doing anything with Arrow yet, but have been talking about using it in the future for sharing data between kernels in polyglot notebooks.

On Thu, Oct 15, 2020 at 11:22 AM Dan Marshall notifications@github.com wrote:

@RandomFractals https://github.com/RandomFractals no I haven't.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/SandDance/issues/154#issuecomment-709507063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVCPCCRSFPJOPYLY2JHXXTSK44UVANCNFSM4J3BC2LA .

RandomFractals commented 4 years ago

yeah, @gramster: that's the one scenario where I think we are close to getting it work once you go ga ;)

still, that's only in the context of .net interactive notebooks, or .dib's as you call them :)

I brought it up with vscode team in our last authors feedback monthly call & their stance on this is that extensions can device their own ways of sharing data, i.e. no plans to provide a built-in vscode api for that anytime soon. It did come up a few times in convos with other extension authors in vscode dev community slack.

I think if they added some channels pub/sub, we could see a lot of clever integrations for extensions sharing data beyond notebooks.