setup_gdal
notebook to generate an init script in the specified location in the Workspace (or UC Volumes).This is the simpler example of loading geoJSON datasets downloaded from the DEFRA Data Services Platform. The example uses:
for all available areas.
FileStore
or a mounted ADLS/Blob storage.This is the more complicated example, which aims to build a dataset that can help predict flood risk by area/tile.
This should run in a self-contained fashion as long as the paths and catalog/schema names are updated.
00 Download Data
(with updated paths) should download all the raw files to the DBFS.01 Load Data To Delta
goes through each directory and loads the files of various formats into Delta format.02 Split Holdout
does a quick stratified split of the main table (components
) based on the presence or absence of any flood risk.03a
and 03b
do the sample tessellation, indexing, joins and basic feature engineering of all the Delta tables into a single feature table for the train and test datasets, respectively.features_train
table is created, an AutoML experiment can be kicked off (NB this requires an ML cluster) via the UI.