xtracthub / xtract-images

1 stars 0 forks source link

Create a better training set #2

Open tskluzac opened 5 years ago

tskluzac commented 5 years ago

Using CDIAC and other sources (Google image searches?), we need to build a better image training set since this one is massively overfitted (one cause I can see is that all the maps look like each other).

I'd say we need the following to start: 100 photographs (images taken with a camera) 100 scientific plots (like graphs and charts) 100 maps (geographic. Some with and without axis labels) 100 graphics (illustrations such as architecture diagrams)

Should have a test_data folder where each subfolder is the type of training data (so 4 subfolders). This way we don't have to manually label 🍾

tskluzac commented 4 years ago

Classifying some maps as map_plots (not too problematic) and some photos as graphics (slightly more problematic). Need to formally test accuracy on an external test set, but it's certainly better than before.

Close once we have stats showing true improvement.