saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
311 stars 51 forks source link

More pipe friendly output #6

Closed raivivek closed 4 years ago

raivivek commented 5 years ago

Hi Saket,

It appears that the current output of pysradb command, one given below for example from the README, is not very friendly to parsing by tools such as awk or cut. For instance, I'd only like to retain a few columns from the output, but usual attempts such as awk -F "\t" .. or cut -f1-5 fail for columns which contain description text. This is a problem only if I use the direct string output from the command and not through the --saveto option.

pysradb metadata --db ./SRAmetadb.sqlite SRP075720 --detailed --expand

Cheers, Vivek

saketkc commented 5 years ago

Hi Vivek,

Part of this happens because of the way pandas handles its to_string outputs. I have forced it to use a left justified output. This was a lazy way to get pretty printing to work. I can add an additional flag to force using only one type of delimiter.

For now a hacky way to do this: pysradb metadata --detailed --expand --saveto /tmp/SRP075020.txt && cat /tmp/SRP075020.txt | cut -f 1-5

As an aside, You do not need to specify -db if the SRAmetadb.sqlite is in your current working directory.

saketkc commented 4 years ago

I am closing this since I have no better way to handle this. I would recommend saving the output using the --saveto <location> argument.

ronin-gw commented 3 years ago

Hi. I noticed the same issue, but I realized that is expected behaviour.

Another way to pipe the --saveto option is using process substitution:

pysradb metadata --detailed --expand --saveto >(cut -f 1-5)

While it can be tricky, one of the possible implementations is detecting whether the stdout is attached to a console (for example by using sys.stdout.isatty) then separating rows by tab characters if the output is not a console.