Open adriantre opened 1 year ago
Edit: Better proposal below.
Proposed changes:
class RasterDataset(GeoDataset):
def __init__(
self,
..., # existing params
filenames: Optional[List[str]] = None
) -> None:
...
# Populate the dataset index
i = 0
if not filenames:
pathname = os.path.join(root, "**", self.filename_glob)
filepaths = [filepath for filepath in glob.iglob(pathname, recursive=True)]
else:
filepaths = [os.path.join(root, filename) for filename in filenames]
for filepath in filepaths:
# continue on line 366 in the original code
and filenames should contain eventual subdirectories.
Just found the listdir-method of fiona. It does not support recursive walks but will list sub-blobs in virtual file systems.
from fiona.errors import FionaValueError
def listdir_vsi_recursive(root):
dirs = [root]
files = []
while dirs:
dir = dirs.pop()
try:
subdirs = fiona.listdir(dir)
dirs.extend([os.path.join(dir,subdir) for subdir in subdirs])
except FionaValueError:
files.append(dir)
return files
class RasterDataset(GeoDataset):
def __init__(
self,
..., # existing params
vsi: bool = False
) -> None:
...
# Populate the dataset index
i = 0
filename_regex = re.compile(self.filename_regex, re.VERBOSE)
if vsi:
filepaths = listdir_vsi_recursive(root)
else:
pathname = os.path.join(root, "**", self.filename_glob)
filepaths = [filepath for filepath in glob.iglob(pathname, recursive=True)]
for filepath in filepaths:
# continue on line 366 in the original code
Note that we technically support this in 0.5.0, although the user has to manually pass in a list of files.
https://github.com/microsoft/torchgeo/blob/9e57f278188ca36348ce8d5c30d5ae2acb19107c/torchgeo/datasets/geo.py#L363-L367
GDAL virtual file systems such as reading directly from Google Buckets (
/vsigs/
) are natively supported by rasterio (through gdal).The glob-matching (source code linked above) is the only thing stopping this currently.
What do you think the best way is to do this? My initial guess is that supporting the glob-matching for all the different file systems would take some effort.
The quickest fix (for me at least) would be to add an optional parameter
filenames:List
that is iterated, and the (already existing) try/except would handle if the filename is wrong.