Closed qcl1994 closed 6 years ago
Yes I used the 256^2 images to save time and disk space. I experienced similar loss behavior in training. I would not worry about the loss behavior so much as the behavior of the descriptor layer from the snapshotted weights. If you plot the precision-recall from multiple snapshots you should see them behave differently despite a similar average loss value.
Thanks for quick response! But I still can not understand why loss did not go down but the system achieved a better performance.
In the end it doesn't really matter if the net learns the HOG descriptor for each warped image, because the HOG here is not very good at place recognition--so a decrease in loss may not be beneficial. The intermediate features are what we want, and since the loss is nonzero, the propagated gradient is nonzero as well. Therefore we have weights being updated every iteration. It turns out that the HOG provides just enough of a geometric prior to update the weights beneficially up to a certain point if there is sufficient training data.
Thanks for your reply! Another question, why you used hog, not Gist or other global image descriptor
That's a good question! We did cross validate a few types of descriptors, and also tried using just the images and no descriptor, but we did not try Gist. We chose HOG as one possibilty since alot of recent papers have used HOG to achieve some incredible place recognition results (going with accuracy over efficiency). We cite a few in our paper. It is infeasible to try every type of descriptor since there are so many out there. Since this feed is really for issue tracking, I think we should continue over email if you have further questions (nmerrill@udel.edu)
Hi, I use the Places dataset(256*256) to train the convolutional autoencoder, but the train loss decline from 130 to about 60, and the loss could not decline. Could u tell me what's wrong? Thank u so much