tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.66k stars 1.65k forks source link

Unable to Retrieve Embedding Arrays From TensorBoard Logs #6879

Open Louagyd opened 2 months ago

Louagyd commented 2 months ago

I am encountering difficulties in retrieving embedding arrays that were logged using add_embedding from TensorBoard logs. I am unable to locate the actual embedding arrays. Below is a detailed description of the issue and the steps I have taken so far.

Steps to Reproduce Logging Embeddings:

I used add_embedding to log embeddings in TensorBoard. Example code for logging embeddings:

from torch.utils.tensorboard import SummaryWriter
import numpy as np

# Create a SummaryWriter
log_dir = 'logs/embedding_example'
writer = SummaryWriter(log_dir)

# Generate some dummy embeddings
embedding_data = np.random.randn(100, 64)  # 100 items with 64-dim embeddings
metadata = [f'Label {i}' for i in range(100)]

# Write the embeddings
writer.add_embedding(mat=embedding_data, metadata=metadata, global_step=1)

Attempting to Retrieve Embeddings:

I tried using EventAccumulator to load and parse the event files but was unable to locate the embedding arrays. Example code for extracting embeddings:

import os
import numpy as np
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator

def extract_embeddings_from_log(log_dir):
    event_acc = EventAccumulator(log_dir, size_guidance={'tensors': 0})

    embeddings = {}

    # Get tags for tensors (embeddings should be listed here)
    tensor_tags = event_acc.Tags()

I would appreciate any guidance or suggestions on how to properly retrieve the embedding arrays logged using add_embedding. Specifically, I am looking for:

Environment Details Framework: PyTorch Logging Library: TensorBoard TensorBoard Version: 2.16.2 Python Version: 3.10 Operating System: Ubuntu 22.04

Thank you for your assistance.

rileyajones commented 1 month ago

Embeddings are treated differently than other logs as they are really part of the projector plugin. As a result they are written to a separate file projector_config.pbtxt and only read in by the projector plugin.

I'm not sure exactly what you're trying to read out, but you may find success using something like this.

import os
import tensorflow as tf
from google.protobuf import text_format
from tensorboard.plugins import projector

with tf.io.gfile.GFile(
    os.path.join(logdir, "projector_config.pbtxt")
) as f:
    config2 = projector.ProjectorConfig()
    text_format.Parse(f.read(), config2)