visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Other
1.56k stars 76 forks source link

Broken example - Analyzing Image Classification Dataset. #164

Closed dnth closed 1 year ago

dnth commented 1 year ago

When I tried to run the Analyzing Image Classification Dataset example, I encountered an error

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/fastdup/sentry.py", line 121, in inner_function
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/fastdup/fastdup_controller.py", line 338, in run
    assert first_filename.startswith(str(input_dir)), f"annotation dataframe should contain full path filenames, starting with {input_dir}"
AssertionError: annotation dataframe should contain full path filenames, starting with imagenette2-160
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-8-0c199c4abbf1>](https://localhost:8080/#) in <cell line: 5>()
      3 
      4 fd = fastdup.create(work_dir=work_dir, input_dir=data_dir)
----> 5 fd.run(annotations=df_annot, ccthreshold=0.9, threshold=0.8)

3 frames
[/usr/local/lib/python3.9/dist-packages/fastdup/fastdup_controller.py](https://localhost:8080/#) in run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, **fastdup_kwargs)
    336             first_filename = annotations['filename'].values[0]
    337             if (str(input_dir)) != ".":
--> 338                 assert first_filename.startswith(str(input_dir)), f"annotation dataframe should contain full path filenames, starting with {input_dir}"
    339         self._init_run(input_dir, annotations, subset, embeddings, data_type, overwrite, fastdup_kwargs)
    340 

AssertionError: annotation dataframe should contain full path filenames, starting with imagenette2-160

Here's the colab notebook I ran

https://colab.research.google.com/drive/1fsQpUbaO9ndzA010xTOB9MpbcgQ-VpXC?usp=sharing#scrollTo=92a6e2f9-e60c-44c0-b48a-f7413f7594ae

dnth commented 1 year ago

In an attempt to fix the issue, I appended the imagenette2-160 path to the filenames column and ended up with an annotation df like the following

image

With this I'm able to run fastdup.

fd.annotations() shows:

image

However, when I tried to run fd.vis.similarity_gallery() I got a warning that the annotations are not loaded correctly

image