shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
992 stars 84 forks source link

bug: spurious points in plots #279

Closed janxkoci closed 2 months ago

janxkoci commented 2 months ago

First, thank you for this handy tool! I especially love the plotting feature, but I've encountered a problem, where I often have spurious extra points in my plots.

reproducible example

The issue is already visible using this toy example:

seq 1 10 | awk '{print $1, $1}' OFS="," | csvtk -H plot line -x 1 -y 2 > csvtk_plot_bugreport.png

The code above produces the following plot:

csvtk_plot_bugreport

Note that the plot has a point in lower-right corner that should not be there.

Another issue is that the line gets all zig-zaggy if the input is not sorted on the column being plotted (I can report it as separate issue later, as I don't have a good reproducer now).

It would be also cool to support groupping in line plots, so that different series of data could be plotted together using different colours and symbols (like the first example from gonum/plot wiki). Also box could support colours for groups of boxes (based on some factor), but the UX may need some thought, as -g is already used for grouping the values for the individual boxes. Again, I can submit these as separate feature requests later, if you prefer.

version

I just updated from 0.29 to 0.30, using bioconda, and the issue is in both versions (my OS is Linux).

janxkoci commented 2 months ago

Ok I'm dumb - I just realized the extra point is actually a legend 🤦 However it never has labels so it never occurred to me before. How do I add category label to it?

shenwei356 commented 2 months ago

csvtk plot has a global option:

  -g, --group-field string     column index or column name of group
(seq 1 10 | awk '{print "a", $1, $1  }' OFS=","; \
 seq 1 10 | awk '{print "b", $1, $1+1}' OFS=",";) \
| csvtk -H plot line -x 2 -y 3 -g 1

image

janxkoci commented 2 months ago

Ok, I really misunderstood the feature, the issue can be closed.

One last question: Is there a way to disable the legend? For the cases when there is just one series. It can also be useful when we want to combine multiple plots with one shared legend (e.g. with imagemagick montage).

shenwei356 commented 2 months ago

Here they are:

The same example:

seq 10 | csvtk mutate -H | csvtk -H plot line -x 1 -y 2 -o t.png

t

janxkoci commented 2 months ago

Is this a new release? I install csvtk with conda to have consistent versions across my computers.

shenwei356 commented 2 months ago

no, just a pre-release, it should be 0.30.1, I didn't change it .

janxkoci commented 1 month ago

No problem, and sorry if I came across as rude and demanding - you are free to release whenever you want :wink:

Especially if we can pack a few more fixes into the next release, like the simple fix for #280 :grin:

And I tested the above version on one of my laptops - works like a charm, thanks!