weecology / DeepForest

Python Package for Airborne RGB machine learning
https://deepforest.readthedocs.io/
MIT License
505 stars 172 forks source link

Retraining From Existing Weights (Model) #102

Closed aasem closed 4 years ago

aasem commented 4 years ago

There seems to be a problem with starting retraining from an existing locally saved model. When I start training by calling model.user_release(), after creating an empty model (as in the docs), the training progresses fine. However, let's say, if I stop after one epoch, save my model using my_model.save('mymodel.h5') and then begin training again from this point by calling mymodel_new = deepforest.deepforest('mymodel.h5') the initial loss is too much (more than 1) and not where I left previously. After complete cycles of training, the mean precision comes zero and loss doesn't reduce. I have tried it with the provided model by calling mymodel = deepforest.deepforest('NEON.h5') and the results remain same. I have debugged a lot and couldn't exactly pick, what parameter should I set like a checkpoint or callback.

Is there a right method in DeepForest package to do it or should I just explore standard ways to do it in keras?

Thanks.

bw4sz commented 4 years ago

Thanks for this message, I’ll have a look this afternoon. My initial thought is that this is a different between keras save model and Keras save weights. When you save just the weights you don’t see the state of the optimizer. This is definitely upstream of this package, but I’m happy to provide a guess. To summarize the question is how best to pause and restart a DeepForest model. I’ll look into it. Just out of curiosity, because this package is brand new, can you give me a sense of your workflow, any overall suggestions, how you found this, maybe drop a sample picture. We are still gathering information about how best to engage with users

On Fri, Apr 10, 2020 at 7:38 AM Asim D. Bakhshi notifications@github.com wrote:

There seems to be a problem with starting retraining from an existing locally saved model. When I start training by calling model.user_release(), after creating an empty model (as in the docs), the training progresses fine. However, let's say, if I stop after one epoch, save my model using my_model.save('mymodel.h5') and then begin training again from this point by calling mymodel_new = deepforest.deepforest('mymodel.h5') the initial loss is too much (more than 1) and not where I left previously. After complete cycles of training, the mean precision comes zero and loss doesn't reduce. I have tried it with the provided model by calling mymodel = deepforest.deepforest('NEON.h5') and the results remain same. I have debugged a lot and couldn't exactly pick, what parameter should I set like a checkpoint or callback.

Is there a right method in DeepForest package to do it or should I just explore standard ways to do it in keras?

Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/weecology/DeepForest/issues/102, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLFG6GPRGOSOTPT4DBDRL4VNXANCNFSM4MFPUKTA .

-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/

aasem commented 4 years ago

Thanks for this message, I’ll have a look this afternoon. My initial thought is that this is a different between keras save model and Keras save weights. When you save just the weights you don’t see the state of the optimizer. This is definitely upstream of this package, but I’m happy to provide a guess. To summarize the question is how best to pause and restart a DeepForest model. I’ll look into it. Just out of curiosity, because this package is brand new, can you give me a sense of your workflow, any overall suggestions, how you found this, maybe drop a sample picture. We are still gathering information about how best to engage with users On Fri, Apr 10, 2020 at 7:38 AM Asim D. Bakhshi @.***> wrote: There seems to be a problem with starting retraining from an existing locally saved model. When I start training by calling model.user_release(), after creating an empty model (as in the docs), the training progresses fine. However, let's say, if I stop after one epoch, save my model using my_model.save('mymodel.h5') and then begin training again from this point by calling mymodel_new = deepforest.deepforest('mymodel.h5') the initial loss is too much (more than 1) and not where I left previously. After complete cycles of training, the mean precision comes zero and loss doesn't reduce. I have tried it with the provided model by calling mymodel = deepforest.deepforest('NEON.h5') and the results remain same. I have debugged a lot and couldn't exactly pick, what parameter should I set like a checkpoint or callback. Is there a right method in DeepForest package to do it or should I just explore standard ways to do it in keras? Thanks. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#102>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLFG6GPRGOSOTPT4DBDRL4VNXANCNFSM4MFPUKTA . -- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/

Thank you, Ben. The model.save() method saves both the architecture, that is sequential or functional flow, optimizer choice etc, and the weights into a single HDF5 format file. This is the method that is demonstrated in the DeepForest docs. The model.save_weights() of Keras just saves the weights and not everything else required to reload the model. The Keras model.load() reloads the model but cannot start retraining from the same state. For this purpose there is keras.callbacks.Callback. So my question boils down to this:

I assume deepforest.deepforest('thismodel.h5') reloads a saved model (architecture and weights) and we can use it to make predictions. Can we use callback methods using the deepforest() class or is there a standard tested way to do that?

As for the other question of feedback, I think DeepForest is a very useful package for quick application of RetinaNet on detecting tree canopies. I have GIS imagery from which I want to do that. DeepForest is a great starting point to straightaway do that rather than playing with RetinaNet ab initio. However, since it's a project that have presumably evolved from your ecological research, it's use-case generalization is kind of questionable. Therefore, it is excellent for remote sensing or other peripheral applied research but requires tweaking if one wants to work in signal processing or machine learning problems related to ecology. For instance, if I plan to improve RetinaNet for specific purposes of crop analysis, DeepForest doesn't quickly provide a useful starting point.

Secondly, again from generalization perspective, there are choices which were presumably taken from the standpoint of a particular problem at hand. For instance, I debugged through the utilities.py and changed int typecast of annotations to float since, in my case, exported JSON or xml were in floating point. But when I changed the typecast, there were errors so I changed utilities.py again and rounded all my floats and typecasted back to int. Not a great way to do it, but I was just doing it to get through my problem at hand. I think a lot of such improvements can be done which would generalize this excellent package.

I think with some improvements, for specific applications of crop or forest analysis, DeepForest can be a good transfer learning point of RetinaNet. Dense environments with focal loss are quite varying and when I go about detecting certain kind of trees and design an optimized RetinaNet (using DeepForest), I should be able to transfer learn from other optimized networks made using DeepForest. Perhaps ambitious but good for ecological and remote sensing researchers.

Best regards

aasem commented 4 years ago

To summarize the question is how best to pause and restart a DeepForest model. I’ll look into it. Just out of curiosity, because this package is brand new, can you give me a sense of your workflow, any overall suggestions, how you found this, maybe drop a sample picture.>

I was wondering why I thought at first place about pausing and retraining from the existing saved model and these lines in section 3.4 of DeepForest docs probably nudged me there.

We envision that for the majority of scientific applications atleast some finetuning of the prebuilt model will be worthwhile. When starting from the prebuilt model for training, we have found that 5-10 epochs is sufficient. We have never seen a retraining task that improved after 10 epochs, but it is possible if there are very large datasets with very diverse classes.>

What exactly does finetuning of the prebuilt model means? I thought it means loading the saved model on my local disk (may it be NEON.h5 or the model that I have retrained on my data) and then training it further with more data. I now realize that it probably means transfer learning from weights of NEON.h5 after creating an empty model with deepforest.deepforest() and then calling deepforest.use_release(). Alternatively, it may mean transfer learning from RetinaNet weights. Not sure which of the two these lines in the doc means. Please clarify this. Thanks.

aasem commented 4 years ago

In continuation to my last comment, I guess I have pretty much clarified my concern to myself. I was perhaps confused and haven't explored all the options. Apologies, if I have wasted your time or cluttered the space. However, not deleting my last comment since it might help someone.

I think the key is the model.config["weights"] parameter which can be set to point the empty model created by deepforest.deepforest() to desired weights. I haven't tested it but I'll do it very soon with a toy file with number of trees that I can count exactly with my eyes so I can match ground truth with predicted ground truth by hand. I hope it will clear a lot.

Thanks for your time. Asim D. Bakhshi

bw4sz commented 4 years ago

Glad its cleared up. Thanks for being an early user. Open issues for EVERYTHING that feels like it could be improved. I already know how it all works, so i'm looking for feedback.