dataloader issue - Githubissues

Alexmsit commented 1 year ago

Hi,

Thanks for sharing your work! Right now i am running scribblekitti on my own data and experience the following issue:

When i run step 1 with train.py, the progress bar shows exact twice the amount of frames than there are in the folders for training. But shouldn't it display the sum of frames inside the training and validation folder? Also it seems the model is validating on the labels inside the folders for training, not on the labels inside the validation folder.

In dataloader/semantickitti.py the 'Baseline' class has the method 'load_file_paths', which is called during initialization of the 'Baseline' class, as well as during the initialization of the 'PLSCylindricalMT'.

Could be the issue that the method is called twice or am i missing a point here?

Baseline:

    self.split, self.config = split, config
    self.root_dir = self.config['root_dir']
    assert(os.path.isdir(self.root_dir))
    label_directory = 'scribbles' if 'label_directory' not in config.keys() \
                                  else config['label_directory']
    self.label_directory =  label_directory if split == 'train' else 'labels'
    self.load_file_paths(split, self.label_directory)

PLSCylindricalMT:

    self.load_file_paths('train', self.label_directory)
    self.nclasses = nclasses
    self.bin_sizes = self.config['bin_size']

ouenal commented 1 year ago

Yes this is expected.

With PLS, we augment the point cloud with local neighborhood information that we extract from the available scribble labels. This means we cannot evaluate on the validation set as we don't have any input labels to begin with (nor do we need to). We therefore reinitialize the file paths to only consider the 'train' split for both training and validation, with the only difference being that we evaluate on the full labels rather than the scribbles.

Our goal with PLS is to "cheat" by using information extracted from the labels to better generalize on the full training labels. This allows us to improve our pseudo-label quality for the distillation step.

Alexmsit commented 1 year ago

Perfect, thank you for the explanation!

ouenal / scribblekitti

dataloader issue #8