voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.93k stars 569 forks source link

[BUG] Embeddings not showing from brain key #4285

Open AlanBlanchet opened 7 months ago

AlanBlanchet commented 7 months ago

Describe the problem

  1. After installing fiftyone and installing the brain plugin (along with all others)
  2. After launching the app with the coco-2017 dataset (validation split), setting it to persistent=True
  3. Running the "Compute vizualisation" operation from the brain plugin (with the defaults)
  4. Or running the "Apply model" and calculating the embeddings from there, then computing visualizations

The embedding view is just empty after selecting a brain key

Brain info :

Brain key:my_key

Run type:
visualization

Creation time:2024-13-18 15:13:07
FiftyOne version:0.23.8
cls:fiftyone.brain.visualization.UMAPVisualizationConfig
type:visualization
method:umap
embeddings_field:embedding
num_dims:2
num_neighbors:15
metric:euclidean
min_dist:0.1
verbose:True

Screenshots : image

Python command output :

(fiftyone-test-py3.10) alan@alan:~/dev/fiftyone_test$ python ./main.py 
['activitynet-100', 'activitynet-200', 'bdd100k', 'caltech101', 'caltech256', 'cifar10', 'cifar100', 'cityscapes', 'coco-2014', 'coco-2017', 'fashion-mnist', 'fiw', 'hmdb51', 'imagenet-2012', 'imagenet-sample', 'kinetics-400', 'kinetics-600', 'kinetics-700', 'kinetics-700-2020', 'kitti', 'kitti-multiview', 'lfw', 'mnist', 'open-images-v6', 'open-images-v7', 'places', 'quickstart', 'quickstart-geo', 'quickstart-groups', 'quickstart-video', 'sama-coco', 'ucf101', 'voc-2007', 'voc-2012']
Downloading split 'validation' to '/home/alan/fiftyone/coco-2017/validation' if necessary
Found annotations at '/home/alan/fiftyone/coco-2017/raw/instances_val2017.json'
Images already downloaded
Existing download of split 'validation' is sufficient
Loading existing dataset 'coco-2017'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
App launched. Point your web browser to http://localhost:5151

Code to reproduce issue

import fiftyone as fo
import fiftyone.zoo as foz

# List available zoo datasets
print(foz.list_zoo_datasets())

#
# Load the COCO-2017 validation split into a FiftyOne dataset
#
# This will download the dataset from the web, if necessary
#
# Give the dataset a new name, and make it persistent so that you can
# work with it in future sessions
dataset = foz.load_zoo_dataset("coco-2017", split="validation", dataset_name="coco-2017", persistent=True)

# Visualize the in the App
session = fo.launch_app(dataset)
session.wait(-1)

System information

Don't know if these could affect :

Other info/logs

I don't really know how to get python outputs. If you could guide me ? I tried using the init_logger function from the package and viewing outputs from the web app. But couldn't find anything useful myself...

Don't hesitate to ask for more info if needed.

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

benjaminpkane commented 7 months ago

Hi @AlanBlanchet. I'm not sure we will be able to reproduce the issue you are seeing without a full script.

Part of the issue may be you are reusing (and not creating a new dataset) noting this output you shared

Loading existing dataset 'coco-2017'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use

To create a new dataset, you can use dataset.clone("cloned-dataset"), or give the zoo dataset a unique dataset_name when using foz.load_zoo_dataset()

AlanBlanchet commented 7 months ago

Hello and thanks for the reply @benjaminpkane .

Yes I'm using a dataset from the zoo. But it should be supported in the viewer ? Right ? I tried to investigate further and managed to get an error from the backend.

The error is simply that we can't dump a numpy array to json when trying to access the API through http://localhost:5151/embeddings/plot.

It returns this error :

Traceback (most recent call last):
  File \"/home/alan/.cache/pypoetry/virtualenvs/fiftyone-test-KI_-E3R8-py3.10/lib/python3.10/site-packages/fiftyone/server/decorators.py\", line 34, in wrapper
    await run_sync_task(lambda: json_util.dumps(response))
  File \"/home/alan/.cache/pypoetry/virtualenvs/fiftyone-test-KI_-E3R8-py3.10/lib/python3.10/site-packages/fiftyone/core/utils.py\", line 2317, in run_sync_task
    return await loop.run_in_executor(_get_sync_task_executor(), func, *args)
  File \"/home/alan/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/thread.py\", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File \"/home/alan/.cache/pypoetry/virtualenvs/fiftyone-test-KI_-E3R8-py3.10/lib/python3.10/site-packages/fiftyone/server/decorators.py\", line 34, in <lambda>
    await run_sync_task(lambda: json_util.dumps(response))
  File \"/home/alan/.cache/pypoetry/virtualenvs/fiftyone-test-KI_-E3R8-py3.10/lib/python3.10/site-packages/bson/json_util.py\", line 472, in dumps
    return json.dumps(_json_convert(obj, json_options), *args, **kwargs)
  File \"/home/alan/.pyenv/versions/3.10.13/lib/python3.10/json/__init__.py\", line 231, in dumps
    return _default_encoder.encode(obj)
  File \"/home/alan/.pyenv/versions/3.10.13/lib/python3.10/json/encoder.py\", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File \"/home/alan/.pyenv/versions/3.10.13/lib/python3.10/json/encoder.py\", line 257, in iterencode
    return _iterencode(o, 0)
  File \"/home/alan/.pyenv/versions/3.10.13/lib/python3.10/json/encoder.py\", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type float32 is not JSON serializable

I haven't had time to investigate why but :

I changed the main.py file to drop the dataset and re-download it if it already exists.

I don't really know how data flows back to the frontend but I guess a fix would be to add a .tolist() if the data is an instance of np.array. I just don't know where...

MLRadfys commented 7 months ago

Hi,

Iam encountering the same issue. For UMAP and TSNE no visualizations are shown when using the brain plugins.. I noticed though that PCA works.

regards,

M

benjaminpkane commented 7 months ago

This may be a caching issue. Running the reload_dataset operator may resolve the issue when encountered

https://github.com/voxel51/fiftyone/assets/19821840/0aa3a63d-f046-4eea-b038-ec3719fb6f21

MLRadfys commented 7 months ago

Hi Benjamin and thanks for the reply!

Unfortunately not. Reloading the dataset does not help.

Kind regards,

M

AlanBlanchet commented 7 months ago

I confirm that reloading the dataset doesn't help. For PCA which works I'm also not able to colorby the ground_truth label. Don't know if it's a distinct bug or if it is linked.

benjaminpkane commented 7 months ago

I confirm that reloading the dataset doesn't help. For PCA which works I'm also not able to colorby the ground_truth label. Don't know if it's a distinct bug or if it is linked.

That is a separate issue likely. I have created #4324 which should resolve the original issue report

MLRadfys commented 7 months ago

Awesome, thanks Benjamin!