Open jacobsela opened 2 weeks ago
@jacobsela when finalized could you provide a code snippet for how you imagine this feature working?
@mwoodson1 Basic snippet I used in the demo:
import fiftyone as fo
import fiftyone.brain.internal.core.leaky_splits as ls
config = ls.LeakySplitsSKLConfig(
split_tags=['train', 'test'],
model="resnet18-imagenet-torch"
)
# skl backend
index = ls.LeakySplitsSKL(config).initialize(dataset, "foo")
index.set_threshold(0.1)
leaks = index.leaks
session = fo.launch_app(leaks, auto=False)
# hash backend
config = ls.LeakySplitsHashConfig(
split_tags=['train', 'test'],
method='image',
hash_field='hash'
)
index = ls.LeakySplitsHash(config).initialize(dataset, "foo")
session = fo.launch_app(index.leaks, auto=False)
The interface seems a bit messy to me. I was hoping for something like
dataset = foz.load_zoo_dataset(...)
leaks = fob.compute_data_leaks(
dataset,
method, # use hash or embedding soft similarity
brain_key, # which similarity index / embeddings to use,
model, # which model to use to compute embeddings
...
)
This would follow similar patterns to fob.compute_visualization
and fob.compute_uniqueness
. For example see the work happening in #201
@mwoodson1 Thanks for the feedback, I agree that this isn't ideal. I'm holding off on creating the final compute_leaks
(or compute_leaky_splits
as it currently is in the code) until we finalize what we want the behavior to look like (e.g. in terms of thresholds). Putting together a final easy to use function at the end should be quick so I'd rather do it once.
Very WIP. Putting this PR up to get feedback.
General idea, decided with @jacobmarks today: Input: user provided tags, field, or views corresponding to splits Output:
Plan forward:
LeakySplitsSKL
. Integrate class with interface.