locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

How to distinguish between training data set and test data set for machine learning? #544

Closed JenniferYingyiWu2020 closed 3 years ago

JenniferYingyiWu2020 commented 3 years ago

Hi, I have read the codes on "supervised machine learning" (https://rasterframes.io/supervised-learning.html), found out that 'B01.tif', 'B02.tif', 'B03.tif', 'B04.tif', 'B05.tif', 'B06.tif', 'B07.tif', 'B08.tif', 'B09.tif', 'B11.tif', 'B12.tif' and "SCL.tif" has been read into from the beginning. However, I am confused about how many ".tif" image have been recognised as "training set", also how many of them are "testing set"? 2 3

Whether the division happened on "model = pipeline.fit(model_input)"? So, could you pls help to give me some suggestions? Thanks!