whitews / FlowKit

A Python toolkit for flow cytometry analysis supporting GatingML and FlowJo workspaces
https://flowkit.readthedocs.io
BSD 3-Clause "New" or "Revised" License
151 stars 36 forks source link

Scatter plots error with few points #197

Open paulmaschhoff opened 2 months ago

paulmaschhoff commented 2 months ago

Describe the bug If you try to plot a scatter plot with a single point using fk._utils.plot_utils.plot_scatter, you get a ValueError on the line z_colors = ... due to a NaN from a divide-by-zero in the first z_norm = ... line.

File /opt/conda/lib/python3.11/site-packages/flowkit/_utils/plot_utils.py:460, in <listcomp>(.0)
    457 else:
    458     z_norm = np.zeros(len(x))
--> 460 z_colors = np.array([custom_heat_palette[int(z * 255)] for z in z_norm])
    462 if highlight_mask is not None:
    463     z_colors[~highlight_mask] = "#d3d3d3"

ValueError: cannot convert float NaN to integer

Similarly, if you try to make a plot with zero points, you get a TypeError in the if y_max > x_max: line:

File /opt/conda/lib/python3.11/site-packages/flowkit/_utils/plot_utils.py:392, in plot_scatter(x, y, x_label, y_label, event_mask, highlight_mask, x_min, x_max, y_min, y_max, color_density, bin_width)
    389 if len(y) > 0:
    390     y_min, y_max = _calculate_extent(y, d_min=y_min, d_max=y_max, pad=0.02)
--> 392 if y_max > x_max:
    393     radius_dimension = 'y'
    394     radius = 0.003 * y_max

TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'

Code To Reproduce Code to reproduce the behavior:

import flowkit as fk
import numpy as np

# For the first error
arr = np.array([1., ], float)
fk._utils.plot_utils.plot_scatter(arr, arr)

# For the second error
arr = np.array([], float)
fk._utils.plot_utils.plot_scatter(arr, arr)

# Bonus - this errors with two points, but only if you set the extents
# not sure why - didn't dig into this one
fk._utils.plot_utils.plot_scatter(
    np.array([0.44592386, 0.52033713]),
    np.array([0.6131338, 0.60149982]),
    x_min=0, x_max=.997, y_min=0, y_max=.991
)

Expected behavior Plot should be generated without error, with one or zero points. Normalization should be skipped or adjusted for edge cases.

Or just declare <2 points invalid and explicitly throw an error at the beginning that's easy to catch.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context

I realize these are corner cases. In my context, they come up when using Session.plot_gate and plotting a gate for all Session samples or all gates for a single sample.

whitews commented 2 months ago

Hi Paul,

Thanks for finding these and reporting them. I'll definitely include a fix for this in the next release. I agree with you that ideally the plot should work with zero or more points.

-Scott