pgf-tikz / pgfplots

pgfplots - A TeX package to draw normal and/or logarithmic plots directly in TeX in two and three dimensions with a user-friendly interface and pgfplotstable - a TeX package to round and format numerical tables. Examples in manuals and/or on web site.
http://pgfplots.sourceforge.net/
187 stars 33 forks source link

boxplot `data filter` drops samples #467

Open joel-coffman opened 8 months ago

joel-coffman commented 8 months ago

The boxplot data filter drops outliers as shown in the following example (adapted from the manual):

\documentclass[tikz]{standalone}

\usepackage{filecontents}
\usepackage{pgfplots}

\pgfplotsset{
  compat=1.18,
  only if/.style args={entry of #1 is #2}{
    /pgfplots/boxplot/data filter/.code={
      \edef\tempa{\thisrow{#1}}
      \edef\tempb{#2}
      \ifx\tempa\tempb
      \else
        \def\pgfmathresult{}
      \fi
    },
  },
}

\usepgfplotslibrary{statistics}

\begin{filecontents*}[overwrite]{combined.csv}
v,set
0.1,a
0.2,a
0.3,a
1.0,a
0.4,a
0.2,a
0.8,b
0.9,b
1.0,b
\end{filecontents*}
\begin{filecontents*}[overwrite]{one.csv}
v,set
0.1,a
0.2,a
0.3,a
1.0,a
0.4,a
0.2,a
\end{filecontents*}
\begin{filecontents*}[overwrite]{other.csv}
v,set
0.8,b
0.9,b
1.0,b
\end{filecontents*}

\begin{document}
\begin{tikzpicture}
  % FIXME: Missing outlier for boxplot of a!
  \begin{axis}[
      boxplot,
      boxplot/draw direction=y,
      table/col sep=comma,
  ]
    \addplot table[only if={entry of set is a},y=v] {combined.csv};
    \addplot table[only if={entry of set is b},y=v] {combined.csv};
  \end{axis}
\end{tikzpicture}
\begin{tikzpicture}
  % Plotting the data from two different data files works!
  \begin{axis}[
      boxplot,
      boxplot/draw direction=y,
      table/col sep=comma,
  ]
    \addplot table[y=v] {one.csv};
    \addplot table[y=v] {other.csv};
  \end{axis}
\end{tikzpicture}
\begin{tikzpicture}
  % Unclear if the issue is the filter or reading from the file...
  \begin{axis}[
      boxplot,
      boxplot/draw direction=y,
      table/col sep=comma,
  ]
    \addplot table[only if={entry of set is a},y=v] {combined.csv};
    \addplot table[only if={entry of set is a},y=v] {one.csv};
  \end{axis}
\end{tikzpicture}
\end{document}

Contrast the first two graphs produced by the prior code:

Example where an outlier has been dropped Expected output -- not using `data filter`

The one on the left is missing the outlier whereas the one on the right shows the outlier as expected.