Closed jlstevens closed 3 years ago
Thank you Jean-Luc for the post.
A couple of notes about the GLWD: it has been about 5 years since I worked with this dataset. The DB was a bit dated then, but was very good and fairly exhaustively captured every lake over about 0.1km^2 (if I recall correctly). You should find all but the smallest dams in the database, but the areal extent can be off for lakes which are shrinking or growing... Also, I thought it also had a field that identified it as dam, so I hope that you would get a near perfect correlation in the ID's.
Hope that helps.
EBo --
On Jul 24 2019 12:38 PM, Jean-Luc Stevens wrote:
In a recent meeting we (@ebo @jsignell @jbednar) came up with some new ideas for public labelled data that can be applied to public satellite imagery (which mostly implies LANDSAT data).
Good criteria for a task are that 1) all the data can be made public 2) the labelled features are big enough to spot with LANDSAT 3) the features can be easily spotted by a human to evaluate the ML performance. The two most promising suggestions were:
- Using the [National Inventory of Dams
Database](https://nid.sec.usace.army.mil/ords/f?p=105:19:30889210318018::NO:::) to mark dams on US imagery. This data has latitude/longitude data so the labels are points. There is one excel file per state and there are
90k dams total.
- Labelling lakes using the [Global Lakes and Wetlands
Database](https://www.worldwildlife.org/pages/global-lakes-and-wetlands-database) which is polygon data. The GLWD-2 dataset has > 250,000 polygons though this is a global database so I don't know how many fall in the US if we want to focus on that.
Another nice thing about these two datasets is that there is a good chance they are correlated with each other!
We ended up with a simple example ; see https://examples.pyviz.org/landuse_classification/Image_Classification.html
In a recent meeting we (@ebo @jsignell @jbednar) came up with some new ideas for public labelled data that can be applied to public satellite imagery (which mostly implies LANDSAT data).
Good criteria for a task are that 1) all the data can be made public 2) the labelled features are big enough to spot with LANDSAT 3) the features can be easily spotted by a human to evaluate the ML performance. The two most promising suggestions were:
Using the National Inventory of Dams Database to mark dams on US imagery. This data has latitude/longitude data so the labels are points. There is one excel file per state and there are > 90k dams total.
Labelling lakes using the Global Lakes and Wetlands Database which is polygon data. The GLWD-2 dataset has > 250,000 polygons though this is a global database so I don't know how many fall in the US if we want to focus on that.
Another nice thing about these two datasets is that there is a good chance they are correlated with each other!