saulpw / visidata

A terminal spreadsheet multitool for discovering and arranging data
http://visidata.org
GNU General Public License v3.0
7.86k stars 277 forks source link

Read from stdin if data is piped in. #69

Closed nfultz closed 7 years ago

nfultz commented 7 years ago

In order to use visidata as a pager with psql or mysql, you need to be able to pipe in data in addition to specifying a file. I had a similar app, here is the snippet that did this:

https://github.com/nfultz/ffss/blob/master/ffss/__init__.py#L60

saulpw commented 7 years ago

Is there a StringIO-like class to wrap a file handle and cache the result, such that if the contents are read twice, it only goes to the file handle the first time? Without reading the whole file in first. The way the .txt/.tsv parser works, it reads the first N characters to see if there are any tabs, and if so, sends it to the .tsv parser which re-reads it.

It seems like it would be a lot easier to just write the psql and mysql plugins, like the existing sqlite plugin. Then writes and 'offline' browsing would be possible as well.

ingydotnet commented 7 years ago

I thought of this right away too, but I'd do it slightly differently.

You should use a filename of - for stdin (a very common CLI convention):

cat birdsdiet.tsv --from=tsv -

The --from= is something I often use in conjunction with stdin. You could make it optional and still use heurstics. Maybe if using - (stdin) you always read the whole file and parse the string instead of the file handle. If the parser only takes a file handle, use io.StringIO: https://stackoverflow.com/questions/11914472/stringio-in-python3

This would be good for using curl on open data sources with vd:

curl -s curl -s https://docs.google.com/spreadsheets/d/1gO-zUzEnPOnYMYnC9OwrBWvl2TLxpz5Y3YNIvJYRB7c/export?format=tsv | vd -`

I think that by default, stdin should be a list of commands. This is also a common CLI idiom.

It would be pretty cool if you could save the current list of commands in a session (the list from 'D' log) to a file like birdsdiet.log maybe, and then replay the session with:

vd < birdsdiet.log

I might try to make a PR for this...

saulpw commented 7 years ago

You can save the current list of commands from the comman'D'log to a .vd file, and then replay it with bin/vdplay (not installed by default yet, I don't think). Or load into vd and replay with ga. Note that vdplay does variable substitutions from command-line args too.

There is already a -f/--filetype option to vd, which I think is what your --from is intended to mean.

I am often using vd for GB datasets, and I worked hard to make it responsive from the start, as it continues loading in the background. So I don't want to read in the entire file before parsing. I will look into using stdin like another file descriptor. The disadvantage is as stated above, that stdin can't be rewound. But I'd rather make a StringIO replacement than suffer the performance consequences of doing it a substandard way.

ingydotnet commented 7 years ago

That all makes sense. Sorry for not learning all the options up front.

I might make vdplay be an option (vd --replay) instead of adding new bins.

saulpw commented 7 years ago

vd pipe and redirect of stdin autodetected via isatty.