lightly-ai / lightly

A python library for self-supervised learning on images.
https://docs.lightly.ai/self-supervised-learning/
MIT License
3.09k stars 264 forks source link

[2024-03-07 22:34:06] An unexpected exception occurred: 1 validation error for GetEmbeddingsCsvReadUrlById embedding_id none is not an allowed value (type=type_error.none.not_allowed) #1513

Closed gauravkuppa closed 7 months ago

gauravkuppa commented 7 months ago
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="")

# Create a new dataset on the Lightly Platform.
client.create_dataset(
    dataset_name="dataset",
    dataset_type=DatasetType.IMAGES  # can be DatasetType.VIDEOS when working with videos
)
my_dataset_id = client.dataset_id
print(my_dataset_id)

client.set_local_config(
    relative_path="./",  # Relative path in the input mount folder   
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="./",  # Relative path in the lightly mount folder
    purpose=DatasourcePurpose.LIGHTLY,
)

permissions = client.list_datasource_permissions()

# Show permission related errors.
print(f"Datasource permission errors: {permissions.get('errors')}")

# Make sure Lightly can access the datasource.
assert permissions["can_list"]
assert permissions["can_read"]
assert permissions["can_write"]
assert permissions["can_overwrite"]

# Configure and schedule a run.
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples":3000,
        # "strategies": [
        #     {"input": {"type": "EMBEDDINGS", "dataset_id": my_dataset_id, "tag_name": "curious-samples",}, "strategy": {"type": "SIMILARITY"}}
        # ],
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "dataset_id": my_dataset_id,
                    "tag_name": "curious-samples",
                },
                "strategy": {
                    "type": "SIMILARITY",
                },
            },
        ],
    },
)
print(scheduled_run_id)

for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")

I want to match my embeddings using similarity. But I get the following error.

[2024-03-07 22:32:16] The license 'Lightly On-Premise License' is valid.
[2024-03-07 22:32:16] Creating a datasource manager...
[2024-03-07 22:32:17] Listing files in input dir /input_mount/pocolocos_video...
[2024-03-07 22:32:20] Found datapool 'dataset' with 0 samples.
[2024-03-07 22:32:20] Found 3598 images in remote datasource.
[2024-03-07 22:32:20] Checking for corrupt images and computing metadata...
Inspecting images (0/3598 corrupt): 100%|███████████| 3598/3598 [01:22<00:00, 43.57image/s]
[2024-03-07 22:33:43] Found 0 corrupt images.
[2024-03-07 22:33:45] Initialising dataset...
[2024-03-07 22:33:45] Embedding images...
Compute efficiency: 0.92: 100%|█████████████████████| 3598/3598 [00:18<00:00, 192.77imgs/s]
[2024-03-07 22:34:04] Saving embeddings to 'output_dir/2024-03-07/22:32:16/data/embeddings.csv'...
[2024-03-07 22:34:04] Removing exact duplicates...
[2024-03-07 22:34:04] Found 21 exact duplicates.
[2024-03-07 22:34:06] Appending 0 samples from the datapool...
[2024-03-07 22:34:06] The datapool file is empty. Datapool appending skipped.
[2024-03-07 22:34:06] Found 0 samples in the datapool which are not already in the dataset and appended them.
[2024-03-07 22:34:06] Normalizing embeddings to unit length (disable with normalize_embeddings=False)...
[2024-03-07 22:34:06] Saving embeddings to 'output_dir/2024-03-07/22:32:16/data/embeddings.csv'...
[2024-03-07 22:34:06] Sampling dataset with stopping condition: n_samples=3000 ...
[2024-03-07 22:34:06] Prediction datasets: 
[]
[2024-03-07 22:34:06] Initializing selection...
[2024-03-07 22:34:06] An unexpected exception occurred: 
1 validation error for GetEmbeddingsCsvReadUrlById
embedding_id
  none is not an allowed value (type=type_error.none.not_allowed)
[2024-03-07 22:34:14] Waiting for jobs...
guarin commented 7 months ago

Hi, thanks for reporting this! We'll look into it ASAP.

japrescott commented 7 months ago

hey @gauravkuppa

The datasetId you are specifying does not have an embedding. Thats the reason for the error.

You need to first make a dataset+run to establish images you are interested in finding more of. Then, you can make a second run (as you are trying to do) where you specify the datasetId and tag of the first dataset to search for in an (optionally different) datasource. You can find more information in our tutorial: https://docs.lightly.ai/docs/use-similarity-search-to-find-similar-samples

If you have further questions, please reach out