Open cortadocodes opened 2 years ago
Just dropping by with an interesting case, where I added cloud datafiles to a local dataset. In this case it pretty much worked like a charm, although I felt that actually what should have happened was a creation of a new instance of Datafile on add() to the dataset... because things like exists_in_cloud
were still set on the datafile after its addition.
It wasn't instinctive to do any of this, took a lot of debugging to understand that I could do this. So a more explicit pattern might be helpful.
import logging
from octue.resources import Dataset
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
# Complete data lakes
all_elevation_maps = Dataset(path="gs://lake-elevation-maps", recursive=True)
all_mast_timeseries = Dataset(path="gs://lake-mast-timeseries", recursive=True)
all_wind_maps = Dataset(path="gs://lake-wind-maps", recursive=True)
# Fixture files
fixture_elevation_files = [all_elevation_maps.files.one(id__contains="5229e870")]
fixture_mast_timeseries_files = [
all_mast_timeseries.files.one(id__contains="e6afc3ea"), # m1
all_mast_timeseries.files.one(id__contains="739c4fdc"), # m2
all_mast_timeseries.files.one(id__contains="2a37e57e"), # m3
all_mast_timeseries.files.one(id__contains="1dbed715"), # m4
all_mast_timeseries.files.one(id__contains="0b216b8c"), # lidar
]
fixture_wind_map_files = [
all_wind_maps.files.one(id__contains="c9823f65"), # 149m
all_wind_maps.files.one(id__contains="2a77b636"),
all_wind_maps.files.one(id__contains="47e16290"),
]
# Create fixture datasets
sets = {
"tests/data/hills_of_gold/elevation_maps": fixture_elevation_files,
"tests/data/hills_of_gold/mast_timeseries": fixture_mast_timeseries_files,
"tests/data/hills_of_gold/wind_speed_maps": fixture_wind_map_files,
}
for path, files in sets.items():
ds = Dataset(path=path)
ds.update_metadata()
for file in files:
ds.add(file)
for file in ds:
file.update_local_metadata()
We're arriving at a clearer distinction of what local and cloud datasets are:
Local dataset:
Cloud dataset:
The files in both types of dataset can have local/cloud duality but the following restrictions apply:
Should we enforce these restrictions or just advise them?
Originally posted by @cortadocodes in https://github.com/octue/octue-sdk-python/issues/364#issuecomment-1082099381