spectralpython / spectral

Python module for hyperspectral image processing
MIT License
573 stars 139 forks source link

How to use a spectral library to train a classifier #62

Closed heidiaclayton closed 7 years ago

heidiaclayton commented 7 years ago

Hello, I am currently working on a project where I want to classify AVIRIS images using a spectral library (USGS right now and later ASTER). Opening the USGS .hdr file returns a SpectralLibrary object, but I cannot use this to create training classes using the "create_training_classes" function as there is no "shape" attribute for the SpectralLibrary class. Is there a way to make this work? Thank you for your time.

tboggs commented 7 years ago

It isn't clear what you are trying to do with your data. The create_training_classes function is for producing training sets from image data when you have multiple samples for each class. Unless you are planning to aggregate spectra from your library into classes, that function is probably not appropriate for your situation. How are you intending to use your library spectra?

heidiaclayton commented 7 years ago

I would like to use each of the minerals recorded in the library as a training class in order to classify the land cover of AVIRIS images.

donm commented 7 years ago

Most (or all) of the classifier functions that take a TrainingClass or TrainingClassSet (the output from create_training_classes) are using the variance within each class as part of the classification algorithm. But if you open a spectral library with only one spectrum per class, the covariance matrix for each class can't be calculated. Things like a Gaussian classifier won't work, even if you forced the spectral library into a TrainingClassSet.

Instead, you can use a classifier that uses only a single spectrum per class, like SAM:

http://www.spectralpython.net/class_func_ref.html#spectral.algorithms.algorithms.spectral_angles Also see the docs for SpectralLibrary; the class doesn't have a shape attribute because the numpy array with the spectral data is held in the .spectra attribute.

http://www.spectralpython.net/class_func_ref.html#spectral.algorithms.algorithms.spectral_angles

On Tue, Mar 7, 2017, 3:19 PM heidiaclayton notifications@github.com wrote:

I would like to use each of the minerals recorded in the library as a training class in order to classify the land cover of AVIRIS images.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/spectralpython/spectral/issues/62#issuecomment-284915539, or mute the thread https://github.com/notifications/unsubscribe-auth/AAg8zu9kVoa1bj8dp2DCEM1TgEu-5txMks5rjgIfgaJpZM4MVFfM .

heidiaclayton commented 7 years ago

I see, this makes more sense than trying to force the data into TrainingClassSet. In terms of the parameters to spectral_angles, the data parameter takes an M x N x B array and the endmember parameter takes an M x N array. Is there a way to extract data from the SpectralLibrary class to match these parameters? I'm assuming by doing something with the .spectra attribute, which is an M x N array.

ghost commented 7 years ago

@lewismc @heidiaclayton one approach we have heard about from @narayave is to perturb training data to simulate more training samples for the neural network. for example, we could define a function to take an input training data set with a single sample per class, duplicate each sample N times and perturb each by randomly multiplying each element by some small factor p, and output a training data set with N samples per class. would appreciate if anyone experienced in machine learning could chime in on this idea.

the other classifiers may be preferable as discussed above. i initially implemented the code in question on the incorrect assumption that the spectral libraries contained multiple samples per class.

tboggs commented 7 years ago

@heidiaclayton The .spectra attribute of the spectral library is C x B, where C is the number of spectra and B is the number of bands. So you should be able to use the array as-is for SAM.

tboggs commented 7 years ago

@browtayl Perturbing data to boost the size of the training set is fairly common when the object to be classified is an entire image since the target within the image usually does not have a fixed location, orientation, etc. So in those circumstances, it is not uncommon to randomly shift, zoom, rotate, warp, or mirror images to increase the number of samples.

But when you are classifying individual spectra, it isn't clear whether there would be benefit or harm by doing that. One possible situation where it might help is if the spectra to be classified have significant spectral calibration errors. Perhaps introducing random dropouts might help. Another option is to use image data to estimate covariance from one or more training images, then use the computed covariance to add random variability to library spectral.

@heidiaclayton If you use an estimated covariance to randomly modify library spectra, then you could treat the data like a pseudo-image and still use create_training_sets. Suppose you create S perturbed instances of the library spectra and store them in an N x S x B, where N is the number of library spectra and B is the number of bands. Then you can create a pseudo-ground truth mask like this

gt = np.repeat(np.arange(1, N + 1)[:, None], S, axis=-1)

For example:

>>> (N, S) = (3, 4)
>>> gt = np.repeat(np.arange(1, N + 1)[:, None], S, axis=-1)
>>> print gt
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]
ghost commented 7 years ago

@tboggs @heidiaclayton @lewismc we ended up classifying our images like so, so this issue can be considered resolved. thank you everyone for your valuable insight.