visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Other
1.54k stars 74 forks source link

[Bug]: Connected Components DataFrame does not match number of instances #237

Closed dnth closed 1 year ago

dnth commented 1 year ago

What happened?

image

What did you expect to see?

They should match.

What version of fastdup were you runnning on?

1.27

What version of Python were you running on?

Python 3.10

Operating System

Google Colab

Reproduction steps

Run Colab notebook - https://colab.research.google.com/drive/10p5VGaVo-Lolwu3EG0c1vokhyYF_RouN#scrollTo=7MF3cowvjVF1

Relevant log output

No response

Attach a screenshot [Optional]

No response

Contact Details [Optional]

No response

dbickson commented 1 year ago

HI @dnth, connected_components() now returns a dataframe with images that has other similar in their cluster. For images that are not present in any cluster there are no rows. Is there a specific reason you need a full dataframe? If so you can simply do comp = pd.read_csv(os.path.join(work_dir, 'connected_components.csv'))