compute_visualization(..., points=points) can now be used to provide your own manually computed low-dimensional representation for use with interactive embeddings plots. EG, users may compute their own embeddings and then perform their own UMAP reduction with customized parameters that we do not currently expose.
import numpy as np
import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.brain as fob
dataset = foz.load_zoo_dataset("quickstart").clone()
points = np.random.randn(len(dataset), 2)
results = fob.compute_visualization(dataset, points=points, brain_key="manual")
Ensuring requirements
Introduces a BrainMethod.ensure_requirements() method that is called prior to any expensive computations that allows for ensuring that the necessary packages are installed.
This is currently only relevant for compute_visualization() when using the UMAP backend (the default). Previously, the embeddings would be computed only to raise an error if UMAP is not installed. Now, the error will happen immediately.
Graceful handling of missing embeddings
Updates the default behavior of compute_similarity() and compute_visualization() to replace any uncomputable embeddings with zero vectors. Previously an error would be raised, but only after all embeddings were attempted to be computed.
Now, informative warnings will be printed if embeddings could not be computed, but the user will still get results that they can work with. The idea here is that typical errors are very sparse, and it is better to give the user a result with 99% good data and 1% dummy data so that they can get their work done rather than requiring that everything be computable before they can get anything to work with. Also, when viewing embedding visualization plots, using the lasso is a convenient way to actually isolate the broken samples, since they will likely be separate from any "real data" clusters
The user can pass skip_failures=False to insist that all embeddings must be computable.
Example graceful handling of bad data:
import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.brain as fob
dataset = foz.load_zoo_dataset("quickstart").clone()
dataset.set_values("validity", ["good"] * len(dataset))
# Give 50 samples non-existent images so embedding computaion will fail
bad_view = dataset.limit(50)
bad_view.set_values("filepath", ["/non/exsistent.png"] * len(bad_view))
bad_view.set_values("validity", ["bad"] * len(bad_view))
# Warnings are printed but results are still returned
# Bad data is clearly visible
results = fob.compute_visualization(dataset, brain_key="img_viz")
plot = results.visualize(labels="validity")
plot.show()
# Warnings are printed but results are still returned
# Bad data is clearly visible
results = fob.compute_visualization(dataset, patches_field="ground_truth", brain_key="gt_viz")
plot = results.visualize()
plot.show()
# Warnings are printed but results are still returned
fob.compute_similarity(dataset, brain_key="img_sim")
Best tested with https://github.com/voxel51/fiftyone/pull/1444.
Manual low-dimensional representations
compute_visualization(..., points=points)
can now be used to provide your own manually computed low-dimensional representation for use with interactive embeddings plots. EG, users may compute their own embeddings and then perform their own UMAP reduction with customized parameters that we do not currently expose.Ensuring requirements
Introduces a
BrainMethod.ensure_requirements()
method that is called prior to any expensive computations that allows for ensuring that the necessary packages are installed.This is currently only relevant for
compute_visualization()
when using the UMAP backend (the default). Previously, the embeddings would be computed only to raise an error if UMAP is not installed. Now, the error will happen immediately.Graceful handling of missing embeddings
Updates the default behavior of
compute_similarity()
andcompute_visualization()
to replace any uncomputable embeddings with zero vectors. Previously an error would be raised, but only after all embeddings were attempted to be computed.Now, informative warnings will be printed if embeddings could not be computed, but the user will still get results that they can work with. The idea here is that typical errors are very sparse, and it is better to give the user a result with 99% good data and 1% dummy data so that they can get their work done rather than requiring that everything be computable before they can get anything to work with. Also, when viewing embedding visualization plots, using the lasso is a convenient way to actually isolate the broken samples, since they will likely be separate from any "real data" clusters
The user can pass
skip_failures=False
to insist that all embeddings must be computable.Example graceful handling of bad data: