Closed Lifeguard-alex closed 1 year ago
Hi @Lifeguard-alex apologies for the error, it is an documentation mistake, please change the column name img_filename to filename in your annotation dataframe and let us if this works.
after fixing the img_filename to file name , i have new error:AssertionError: annotation dataframe should contain full path filenames, starting with coco_minitrain_25k/images/train2017
remember i runiing on your colab project
Hi @Lifeguard-alex yeap you need the use full file names when creating the annotations dataframe, namely please add also the folder coco_minitrain_25k/images/train2017 and not just the image name. For example coco_minitrain_25k/images/train2017/image1.jpg
sorry i dont get it , this is from your https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb example
this example have bugs and its not working , how can i fix the path , as you are using coco dataset from the net and downloading it online ? this is not my code. ?
Hi @Lifeguard-alex sorry about that let me fix the example and get back you shortly.
i fix the filename error and i fix the full path for image in csv , and now i have new error AssertionError: df_annot must contain unique filenames, found repeating filenames
guys , i going to give up :) the exmaples just not work and its full of bugs, in the ipynb and in the code exmaple
please make your code work with json coco , with multi filename and multi dir for images.
do you have any working exmaple for coco json i can test ?
hi @Lifeguard-alex i have shared a fixed example. The main fix was
coco_csv = 'coco_minitrain_25k/annotations/coco_minitrain2017.csv'
coco_annotations = pd.read_csv(coco_csv, header=None, names=['filename', 'col_x', 'row_y',
'width', 'height', 'label', 'ext'])
coco_annotations['split'] = 'train' # Only train files were loaded
coco_annotations['filename'] = coco_annotations['filename'].apply(lambda x: 'coco_minitrain_25k/images/train2017/'+x)
coco_annotations = coco_annotations.drop_duplicates()
What you see if the result of api change, we will fix the notebook end to end by tomorrow and share.
hi @Lifeguard-alex a new fix has been released in version 1.3 the fixed notebook is here: https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb please try it out and let us know if you have any issue.
@dbickson What is the proposed solution for AssertionError: df_annot must contain unique filenames, found repeating filenames? I see that even in the example of the COCO dataset provided by you in the notebook that two rows have same filename .
0 | images/train2017/000000131075.jpg | 20.23 | 55.98 | 313.49 | 326.50 | tv | 0 | train |
---|---|---|---|---|---|---|---|---|
1 | images/train2017/000000131075.jpg | 176.90 | 381.12 | 286.20 | 136.63 | laptop | 0 | train |
@Ramayancv i ran the analyzing-object-detection-dataset.ipynb notebook (on Colab) and could not reproduce the error you get. Which version of fastdup are you running on? And which Python and OS?
@dnth I am using fastdup version 1.65 ,Python 3.7.6 and Linux x86_64 OS. I noticed that the notebook runs smoothly on that dataset but run it on any other dataset, you will see the error.
@Ramayancv can you point to me a dataset so I can reproduce the error?
@Ramayancv the issue is in the column name of your annots.csv
If you rename the column to row_y
it should work.
But I find something else that is amiss, the width and height of the bounding box looks suspiciously small. Are those values correct? Or are the values normalized?
@dnth Thank you very much . It worked.
Another question , i in your notebook for identifying Possible mismatch, 1) Where to find the dataframe that is shown at the end of the Notebook, one that shows mismatched values using similarity gallery ?
fd.vis.similarity_gallery(slice='diff')
2) In the following text mentioned in your notebook _The fastdup similarity search and similarity gallery are strong tools for finding objects that are possibly mislabeled. By finding each object's nearest neighbors and their classes, we can find objects with classes contradicting their neighbors' (a strong sign of mislabels).
Running the similarity gallery shows if an image has high similarity with two of its closest neighbors, yet has different labels. This helps surface potential mislabeling in the dataset._
Does it creates the image embeddings of all the areas inside the bounding boxes and then compare those embeddings?
The DataFrame is returned together with the gallery for now. So you can do
df = fd.vis.similarity_gallery(slice='diff')
to get the DataFrame.
What happened?
analyzing-object-detection-dataset.ipynb
fastdup.create error :
AssertionError: Got wrong annotation parameter, should be pd.DataFrame with the mandatory columns: filename img_filename bbox_x bbox_y bbox_w bbox_h label ext split
version last 1.2
What did you expect to see?
No response
What version of fastdup were you runnning on?
2.1
What version of Python were you running on?
Other
Operating System
colab
Reproduction steps
1
Relevant log output
Attach a screenshot [Optional]
Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/fastdup/sentry.py", line 130, in inner_function ret = func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/fastdup/fastdup_controller.py", line 347, in run assert isinstance(annotations, pd.DataFrame) and not annotations.empty and "filename" in annotations.columns, f"Got wrong annotation parameter, should be pd.DataFrame with the mandatory columns: filename {annotations}" AssertionError: Got wrong annotation parameter, should be pd.DataFrame with the mandatory columns: filename img_filename bbox_x bbox_y bbox_w bbox_h label ext split 0 000000131075.jpg 20.23 55.98 313.49 326.50 tv 0 train 1 000000131075.jpg 176.90 381.12 286.20 136.63 laptop 0 train 2 000000131075.jpg 369.96 361.35 72.76 73.91 laptop 0 train 3 000000131075.jpg 411.68 417.87 66.32 129.44 chair 0 train 4 000000131075.jpg 367.31 363.25 72.27 67.01 tv 0 train ... ... ... ... ... ... ... ... ... 183541 000000262103.jpg 2.45 0.91 94.03 181.51 car 0 train 183542 000000393195.jpg 6.10 214.53 331.31 262.83 boat 0 train 183543 000000393195.jpg 46.37 3.34 593.63 478.66 person 0 train 183544 000000393195.jpg 419.40 0.88 217.84 309.23 person 0 train 183545 000000131067.jpg 4.21 1.17 628.93 421.75 fire hydrant 0 train
[183544 rows x 8 columns]
AssertionError Traceback (most recent call last) in <cell line: 6>()
4
5 fd = fastdup.create(work_dir=work_dir, input_dir=image_dir)
----> 6 fd.run(annotations=coco_annotations)
3 frames /usr/local/lib/python3.10/dist-packages/fastdup/fastdup_controller.py in run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, **fastdup_kwargs) 345 annotations = pd.DataFrame({'filename':annotations}) 346 --> 347 assert isinstance(annotations, pd.DataFrame) and not annotations.empty and "filename" in annotations.columns, f"Got wrong annotation parameter, should be pd.DataFrame with the mandatory columns: filename {annotations}" 348 first_filename = annotations['filename'].values[0] 349 if (str(input_dir)) != ".":
AssertionError: Got wrong annotation parameter, should be pd.DataFrame with the mandatory columns: filename img_filename bbox_x bbox_y bbox_w bbox_h label ext split 0 000000131075.jpg 20.23 55.98 313.49 326.50 tv 0 train 1 000000131075.jpg 176.90 381.12 286.20 136.63 laptop 0 train 2 000000131075.jpg 369.96 361.35 72.76 73.91 laptop 0 train 3 000000131075.jpg 411.68 417.87 66.32 129.44 chair 0 train 4 000000131075.jpg 367.31 363.25 72.27 67.01 tv 0 train ... ... ... ... ... ... ... ... ... 183541 000000262103.jpg 2.45 0.91 94.03 181.51 car 0 train 183542 000000393195.jpg 6.10 214.53 331.31 262.83 boat 0 train 183543 000000393195.jpg 46.37 3.34 593.63 478.66 person 0 train
Contact Details [Optional]
alex@lifeguard-ai.com