How to automize folder structure

Hannes9 commented 7 months ago

@qinglew Can you or someone else recommend any tool/workflow/code which is automazing the separation of complete and partial data to its folders and renaming partial data respectively ? I would be interested e.g. how the files & folders were set up for the given PCN example dataset on Google Drive with such a big amount of data automized. E.g. how I have organized my dataset initially: all complete data in one folder & all the partial data in one folder. All filenames are the GUID of the objects.

Thanks a lot !

MarioCavero commented 7 months ago

I am not sure if this is a repository issue... @Hannes9

If you created the dataset with the render and sample files, you can adjust the number of files accordingly also taking into account the structure of PCN's dataset (or not). If you have 1k number of samples per object, you can use 10% for val and 10% for test, leaving 800 for training, or play with however you think best. You can also check how many elements each object has in the current PCN (also in the paper!!) (ls | wc -l). For category X (f.g, plane), you can have 800 planes in partial and complete and within each partial, the 8 sampled ones obtained in the process of render and sample.

You can either do it while the process of render and sample, updating the code or moving (mv) or copying (cp) in the right structure once you have all your dataset (which it seems to me is what you need!).

Hope it helps!

Hannes9 commented 7 months ago

Hello @MarioCavero , Thanks already for the fast response and the information. I created my own dataset without using render and sample files. But it seems to me that I have to find my own solution as my dataset is a bit different and it is not possible for me to just copy them in the right folder.

Thanks a lot !

MarioCavero commented 7 months ago

If that is the case @Hannes9 , I had a similar situation. I recommend to play with the dataset/Shapenet.py file, more specifically in the loading part:

 def _load_data(self):
        with open(os.path.join(self.dataroot, '{}.list').format(self.split), 'r') as f:
            lines = f.read().splitlines()

        if self.category != 'all':
            lines = list(filter(lambda x: x.startswith(self.cat2id[self.category]), lines))

        partial_paths, complete_paths = list(), list()

        for line in lines:
            category, model_id = line.split('/')
            if self.split == 'train':
                partial_paths.append(os.path.join(self.dataroot, self.split, 'partial', category, model_id + '_{}.ply'))
            else:
                partial_paths.append(os.path.join(self.dataroot, self.split, 'partial', category, model_id + '.ply'))
            complete_paths.append(os.path.join(self.dataroot, self.split, 'complete', category, model_id + '.ply'))

        return partial_paths, complete_paths

It can be a good idea to create a train, val and test.list "on-the-go" and just load "from x to n data" for training, and the same for val and testing. I do not think it is strictly necessary to have different folders, as long as all paths for train, val and test are assigned properly!

Hope it helps!

Hannes9 commented 7 months ago

@MarioCavero Yes, it is already quite helpful. Thank you! I will probably have to move the files manually and adjust the partial training files manually as well, but it should be fine for me by using a Mapping table.

Thanks !

qinglew / PCN-PyTorch

How to automize folder structure #34