OS Platform and Distribution (e.g., Linux Ubuntu 22.04): Ubuntu20.04 (container)
Python version (python --version): Python 3.8.16
FiftyOne version (fiftyone --version): FiftyOne v0.21.6, Voxel51, Inc.
FiftyOne installed from (pip or source): pip
Describe the problem
When I attempted to merge a grouped samples into datasets using the key_fcn method, the merging result produced redundant data.
Code to reproduce issue
import fiftyone as fo
# create and define dataset
dataset_name="test"
dataset = fo.Dataset(name=dataset_name,persistent=True)
dataset.add_group_field('group')
dataset.add_sample_field('sid', fo.StringField)
dataset.add_sample_field('key', fo.StringField)
# generate samples
samples=[]
for x in range(1):
group = fo.Group()
sample01 = fo.Sample(
filepath=f's{x}.jpeg',
sid=f's{x}',
key=f's{x}.jpeg',
group=group.element('jpeg')
)
sample02 = fo.Sample(
filepath=f's{x}.pcd',
sid=f's{x}',
key=f's{x}.pcd',
group=group.element('pcd')
)
samples.extend([sample01,sample02])
# add samples to dataset , and print the result
dataset.add_samples(samples)
for element in dataset.group_slices:
dataset.group_slice = element
[print(sample) for sample in dataset]
# reload the dataset , and merge_samples to dataset using key_fcn
def _key_fcn(sample):
key = f"{sample['filepath']}-{sample['sid']}"
# either return key or filepath was the same
# return key
return sample.filepath
dataset = fo.load_dataset(dataset_name)
# merge samples by key_fcn
# cause the bug
dataset.merge_samples(samples,key_fcn=_key_fcn)
# merge samples by key_field
# not cause the bug, when use key_field
# dataset.merge_samples(samples,key_field="key")
# print the merged dataset
for element in dataset.group_slices:
dataset.group_slice = element
[print(sample) for sample in dataset]
The print result running the code above.
We can find the result below, after add_samples the dataset had 2 samples, but after merge_samples the dataset had 3 samples.
The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?
[ ] Yes. I can contribute a fix for this bug independently
[ ] Yes. I would be willing to contribute a fix for this bug with guidance
from the FiftyOne community
[X] No. I cannot contribute a bug fix at this time
System information
python --version
): Python 3.8.16fiftyone --version
): FiftyOne v0.21.6, Voxel51, Inc.Describe the problem
When I attempted to merge a grouped samples into datasets using the
key_fcn
method, the merging result produced redundant data.Code to reproduce issue
The print result running the code above.
We can find the result below, after
add_samples
the dataset had 2 samples, but aftermerge_samples
the dataset had 3 samples.Willingness to contribute
The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?