Open mattjshannon opened 6 years ago
Any thoughts/suggestions, @PAHdb?
If there is no logical rhyme-or-reason to the 'flawed' spectra I would think that the neural network has a difficulty grouping them. So the question is whether the network is able to group these 'flawed' spectra--if it does, it should probably keep it, as it is 15% of the data set and I'm sure there are 'flawed' spectra the Spitzer set. Maybe for now, train two networks and see how they perform ...
This question is pertinent because it may be useful for the models to be able to recognize when data is 'bad'. Alternatively, by excluding these spectra (numbering 177 of the ~1235 in total), we are effectively doing additional preprocessing -- this has resulted in improved accuracy by using logistic regression.
Will leave this as something to ponder, as it might be more philosophy than pragmatism.