vega / vega-lite-api

A JavaScript API for Vega-Lite.
https://observablehq.com/@vega/vega-lite-api
BSD 3-Clause "New" or "Revised" License
211 stars 18 forks source link

Data transform before loading and getting dynamic data to display #191

Closed systemcrash closed 4 years ago

systemcrash commented 4 years ago

Trying to get some dynamic server-side data to display. Currently 3-4000 records in JSON and CSV available. I started with:

https://observablehq.com/@vega/vega-lite-api#standalone_use

And tried getting things going.

Data says https://vega.github.io/vega-lite-api/api/data :

If the argument is a string, sets the url property.

I set datafile.json e.g. vl.data( 'datafile.json' ) - also tried with CSV. Same result.

Network shows that file is loaded. But graph still empty. Clues?

Also, CSV is ready for display - all columns named, but JSON data needs slight massage before processing. The key needs (re)naming. Format is:

{
"162.243.136.184": {"lsf": 2733869240, "firstSeen": "2020-06-05T17:14:13.884582", "lastSeen": "2020-06-05T17:18:06.189240", "timesSeen": 6}, 
"66.229.102.180": {"lsf": 1122330292, "firstSeen": "2020-06-05T17:14:53.053523", "lastSeen": "2020-06-22T23:45:56.450999", "timesSeen": 8635},
...
}

CSV is like:

addr,lsf,firstSeen,lastSeen,timesSeen
193.254.245.162,3254711714,2020-08-27T22:10:56.918220,2020-08-27T22:20:44.575690,16
35.195.163.239,600024047,2020-08-27T23:01:52.380184,2020-08-27T23:33:04.532846,72
...

So I need to do something like:

return Object.entries(data).map((x) =>
  Object.assign({ ip: x[0] }, x[1]) )

Where would I most effectively place the above transform at load time to perform this? Is it currently possible? I tried other variants where I get a promise and do the transform in the vega.loader part of the options, but I could never get the data to load before render completes. Only after. 😦

Some place to do deferred manipulation would be great, otherwise somewhere to name the columns in the data spec.

Here's what I have now using CSV, but graph shows no values. Just axes.

      vl.register(vega, vegaLite, options);
      vl.json( {type: "csv", url: "datafile.text"} )
      // now you can use the API!
      vl.markText({ tooltip: true , filled: true})
      .data( "datafile.text" )
      .encode(
        vl.y().fieldN('addr'),
        vl.x().fieldQ('timesSeen')
        )
        .render()
        .then(viewElement => {
          // render returns a promise to a DOM element containing the chart
          // viewElement.value contains the Vega View object instance
          document.getElementById('view').innerHTML = viewElement.innerHTML;
                  console.log('finished drawing');
        });
        console.log('loaded script body');

I realize now that I can supply a datasource name, e.g. https://vega.github.io/vega-lite-api/api/csv and later redraw, e.g. https://vega.github.io/vega-lite/tutorials/streaming.html but ... wanted to see what I can do with the API :)

domoritz commented 4 years ago

Would it be easier to load the data with fetch, parse and massage it, and then pass it to the vis as an array of objects?

systemcrash commented 4 years ago

The way I'm doing it now seems pretty easy, even though nothing displays yet 😄 But .data() having a .then() for some transforms seems intuitive.

I tried loading an array of objects, via a fetch/promise before, though I could never get the graph to display on page load. Certainly I was doing something wrong:


      var opts = {
            baseURL: "http://resource:9000/logs/",
          };
      var dataz = null;

      var myloader = vega.loader(opts);

      myloader.load('datalist.text').then(function(inData) {
            dataz= vega.read(inData, {type: 'csv', 
              parse: 'auto' });
            console.log('finished vega.read of the CSV');
          });

The logs always printed finished vega.read of the CSV last... I passed myloader in under the vl.register(vega, vegaLite, options);

So I gather I somehow need to refresh data after this load... by calling some .runAsync() on the view? Lost.

Having a place for running a transform would be beneficial for those 'really raw data' cases.

domoritz commented 4 years ago

I'm a bit lost here to be honest. Let's look at the code in more detail.

vl.register(vega, vegaLite, options);

Why do you need to register Vega/Vega-Lite here? Can you use the defaults?

vl.json( {type: "csv", url: "datafile.text"} )

What is this line supposed to do? I think the results need to be passed to a data call, no?

Can you make a small example with Vega-Lite in the editor at https://vega.github.io/editor/? Even if the transform isn't correct yet as it will help me understand what you are trying to do.

systemcrash commented 4 years ago

Is there a simple way to get a vega-lite config out of the vl object I run in the HTML?

domoritz commented 4 years ago

Yes, omit .render to get the spec. See the bottom of https://observablehq.com/@uwdata/introduction-to-vega-lite/.

systemcrash commented 4 years ago

Tried using the .toObject() and Uncaught TypeError: plot.toObject is not a function

But if I run with plot.toString() I get:

{"$schema":"https://vega.github.io/schema/vega-lite/v4.json","mark":{"type":"bar","tooltip":true,"filled":true},"data":{"url":"http://local-resource:9000/logs/datafile.text"},"encoding":{"y":{"field":"addr","type":"nominal"},"x":{"field":"timesSeen","type":"quantitative"}}}

Which in the editor gives either: Blocked loading mixed active content “http://local-resource.... or Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://local-resource.... CORS is activated on my http resource...

domoritz commented 4 years ago

You can inline (sing a values data source) a small example dataset.

systemcrash commented 4 years ago

I guess there is no easy way to in-line CSV....

domoritz commented 4 years ago

There is. You can inline a string. But I suggest you create a simple array of objects so it's easier to work with.

systemcrash commented 4 years ago

Got it... found an example. So waaaaaait - CSV.... there must be some kind of special newline formatting expected.

I export using the python CSV export module, into a text file. And I've tried setting content-type header to text/csv as well as the auto content-type: application/octet-stream, no difference. So perhaps the CSV interpreter barfs on the newline stuff.

Inlined it displays fine:

"data":
    {
    "values": "addr,lsf,firstSeen,lastSeen,timesSeen\r162.243.136.184,2733869240,2020-06-05T17:14:13.884582,2020-06-05T17:18:06.189240,6\r51.81.137.147,860981651,2020-06-05T17:15:10.583462,2020-06-23T12:19:08.322380,320\r5.39.19.236,86447084,2020-06-05T20:17:16.952096,2020-06-22T13:49:56.084543,99\r",
    "format": {
      "type": "csv"
    }
    }
domoritz commented 4 years ago

I have some CSV files that work at https://github.com/vega/vega-datasets/tree/master/data.

systemcrash commented 4 years ago

So, the fix?

I MUST RENAME THE FILE TO .CSV at a URL. For it to be a CSV file, it cannot have any other extension? I could not coerce vl to accept any other of the listed params:

https://vega.github.io/vega-lite-api/api/csv https://vega.github.io/vega-lite-api/api/csvFormat

I even tried: .data( 'datalist.text', {type: "csv"} )

No fly.

(also confirmed this behaviour in the editor)

domoritz commented 4 years ago

Hmm, so Vega-Lite is supposed to automatically use the csv reader when your file extension is .csv but you should be able to override the default. If that's not the case, please file an issue with a small example at https://github.com/vega/vega-lite.

domoritz commented 4 years ago

So it sounds like this issue is somewhere else. I'm going to close the issue here but feel free to follow up and we can file the corresponding issues in the other repos.

snunez1 commented 1 year ago

@systemcrash, did this ever get fixed for you? I am seeing the same behaviour now, three years later. It seems that VL honours neither the application/content-type, nor the format directive. I'm hoping this is a case of user error, but I don't think so.

systemcrash commented 1 year ago

I rewrote my back end to output JSON and went with that... could not get CSV to do my bidding. Have not tried since this.