opendatahub-io-contrib / data-mesh-pattern

Data Mesh Pattern
https://opendatahub-io-contrib.github.io/data-mesh-pattern
Apache License 2.0
24 stars 15 forks source link

Foundation Models Integration - Atlas visualization for datasets #72

Open caldeirav opened 1 year ago

caldeirav commented 1 year ago

Google has been leveraging Atlas visualizer for embeddings: https://atlas.nomic.ai/

Look at the possibility of embedding visualization for distributed data sets as a way to explore / search for data. This can be a complement to an Elastic Search type of discovery.

neoxu999 commented 12 months ago

I tried the code below and it generated the visualizer for me. Please see the published map below https://atlas.nomic.ai/map/18914772-6404-418f-9f0a-697b224ba453/7cd05207-68ec-45ab-b05b-a1d600f0f963

import nomic
from nomic import atlas
import numpy as np

num_embeddings = 1000
embeddings = np.random.rand(num_embeddings, 256)

project = atlas.map_embeddings(
    embeddings=embeddings
)

The process is easy, just need to get an API key and login with atlas.

I can write a proper jypter notebook to show it.