Open benwulfe opened 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
rothn commented on 2023-08-01T20:58:42Z ----------------------------------------------------------------
Line #18. def load_training_validation() -> dataset.PartitionedDataset:
There's a lot of flexibility here. I'm curious whether we need all of it.
View / edit / reply to this conversation on ReviewNB
rothn commented on 2023-08-01T20:59:03Z ----------------------------------------------------------------
Line #10. _OVERLAP_PARTITION_STRATEGY = dataset.FixedPartitionStrategy( # Train dataset.DatasetGeographicPartitions( min_longitude=-60.5, max_longitude=float('inf'), min_latitude=float('-inf'), max_latitude=float('inf'), ), # Validation dataset.DatasetGeographicPartitions( min_longitude=float('-inf'), max_longitude=-60.5, min_latitude=float('-inf'), max_latitude=float('inf'), ), # Test dataset.DatasetGeographicPartitions( min_longitude=float('-inf'), max_longitude=float('inf'), min_latitude=float('-inf'), max_latitude=float('inf') ) )
Would you mind explaining the partition strategy here and what it tries to accomplish?
Some of these options will make their way to the ingestion.ipynb (such as pulling from GEE). For now, these are only in XGB as proof of concept.
These changes also (importantly) allow the XGB colab to consume already-split sets, although right now it does not support the test set.