Create a better training set

Using CDIAC and other sources (Google image searches?), we need to build a better image training set since this one is massively overfitted (one cause I can see is that all the maps look like each other).

I'd say we need the following to start: 100 photographs (images taken with a camera) 100 scientific plots (like graphs and charts) 100 maps (geographic. Some with and without axis labels) 100 graphics (illustrations such as architecture diagrams)

Should have a test_data folder where each subfolder is the type of training data (so 4 subfolders). This way we don't have to manually label 🍾

xtracthub / xtract-images

Create a better training set #2