Open flugenheimer opened 2 years ago
Hey, so you are interested in the pairs of source and destination. Something like (x.jpg, test/x.jpg)? What is your use case for the paths? When do you need the file paths instead of moving/copying the files?
Hey, so you are interested in the pairs of source and destination. Something like (x.jpg, test/x.jpg)? What is your use case for the paths? When do you need the file paths instead of moving/copying the files?
Exactly! the reason is two things:
I therefore often just need a list of the split file pairs and can add it by filename. I still from time to time want to physically split or copy files and folders, therefore I though it could make sense to be able to get the lists of filenames in the different splits as outputs
Maybe what I would actually need is just the list of source files that would be in each split. for my current scenario i am working on semantic segmentation, and the folder structure is therefore:
it would then be nice to be able to get all the source destinations for images and masks in the different splits: train, val and test
Thanks for the explanations. I will look into the issue.
I'm not sure if this package is right for you. I does not support this kind of folder structure. I think scikit learn got you covered: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
I also find that several repositories require you to organise your dataset in a specific data/
directory under their main codebase, which further requires you to have train, val, test splits. Different codebases might have different requirements/structure. So while working with multiple codebases at once, to be efficient and save some space instead of copying/moving files to different directories, its much easier to create symlinks (ln -s
). See issue #31. I have created a pull request #48 for the same and tested it.
Hi,
first off, I really like this function. It could however be nice with a feature of just splitting and outputting the file paths into train, val, test without actually moving or copying any files.