weecology / DeepForest

Python Package for Airborne RGB machine learning
https://deepforest.readthedocs.io/
MIT License
524 stars 176 forks source link

How to train from scratch? #222

Closed ElliotSalisbury closed 3 years ago

ElliotSalisbury commented 3 years ago

Hi

Apologies if this has already been asked, or if it's in a tutorial somewhere, I looked first and couldn't find an answer.

If I wanted to go through the same steps taken to train the release model, i.e., training from scratch using the unsupervised annotated lidar data, and then fine tuning on the hand labelled dataset. How would I go about doing so?

Where can I find the annotation xmls for each of the NEON tiles?

bw4sz commented 3 years ago

The unsupervised data is many terabytes (probably 40TB?) and isn't hosted online, too much storage costs. Why not start from the release model and finetune from there, what is the advantage of going back? You can think of the release model as a lower dimensional representation of those data in a compact and easier to share format?

ElliotSalisbury commented 3 years ago

My intention was to add height information to the model, by adding a 4th channel to the input image.

If I retrain from the release, the model is already in a good minima, and so my worry is that the conv1 layer won't learn how to use that height channel as effectively, and would focus more on the RGB channels as they're already well tuned.

Starting from randomly initialized weights would avoid that.

When you say 40TB, the bulk of that would be the NEON imagery right? Or do you mean there is 40TB worth of xml files giving the bounding boxs of the trees?

ElliotSalisbury commented 3 years ago

I'd be keen to hear your thoughts on how best to add the height information to the model. I know it was spoken about as a future idea in one of your presentations.

Perhaps adding the canopy height to a 4th input channel isn't the best solution, and it could be added at a later stage in the model

bw4sz commented 3 years ago

its a reasonable thought, and you are welcome to the data (it will take some time to transfer), but know that 4th channel input has been tried many times, by multiple people, including me, and has never been successful. contact rebekah loving from stanford, who used LiDAR data from https://github.com/weecology/NeonTreeEvaluation starting from the release model. She has a conference paper on later stage fusion that showed initial promise, but needs to be evaluated much further. Just noting that this is a larger bite of work than it might seem. I'd start the .laz from https://github.com/weecology/NeonTreeEvaluation/tree/master/evaluation/LiDAR

bw4sz commented 3 years ago

I slacked the stanford group to drop their paper in here.

ElliotSalisbury commented 3 years ago

I have been working on a separate piece of work where I used a MaskRCNN model to detect trees, but it is still using the same underlying resnet50 as a backbone.

On that work, I found that training from scratch with a 4 channel input using a CHM I derived from the LIDAR data, and the annotations from NEON Tree Crowns Dataset you uploaded on zenodo ( https://zenodo.org/record/3765872#.YHs-MBMzbUI ) and then finetuning that trained model with my own hand-labelled dataset on more representative trees of our area. That approach gave better results than training the standard RGB channel input.

it's not totally relatable to the work here on DeepForest, as I could only afford the compute time to train on a much smaller NEON dataset, just YELL and BLAN, so perhaps those gains would disappear if I added in more data, or perhaps the CHM is less useful for bounding box detection, but gives better performance for segmentation.

bw4sz commented 3 years ago

It would be interesting to get the benchmark score, providing you didn't use any annotations from https://github.com/weecology/NeonTreeEvaluation

We have a simple R package

https://github.com/weecology/NeonTreeEvaluation_package

On Mon, Jul 19, 2021 at 11:23 AM Elliot Salisbury @.***> wrote:

I have been working on a separate piece of work where I used a MaskRCNN model to detect trees, but it is still using the same underlying resnet50 as a backbone.

On that work, I found that training from scratch with a 4 channel input using a CHM I derived from the LIDAR data, and the annotations from NEON Tree Crowns Dataset you uploaded on zenodo ( https://zenodo.org/record/3765872#.YHs-MBMzbUI ) and then finetuning that trained model with my own hand-labelled dataset on more representative trees of our area. That approach gave better results than training the standard RGB channel input.

it's not totally relatable to the work here on DeepForest, as I could only afford the compute time to train on a much smaller NEON dataset, just YELL and BLAN, so perhaps those gains would disappear if I added in more data, or perhaps the CHM is less useful for bounding box detection, but gives better performance for segmentation.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/weecology/DeepForest/issues/222#issuecomment-882761778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHBLFMZFGOTEWBQ4GECLTTYRUQXANCNFSM5AQWNATQ .

-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/

bw4sz commented 3 years ago

@ElliotSalisbury did you have code to show for the 4 channel? I had a student ask about this.

ElliotSalisbury commented 3 years ago

I don't have the rights to distribute that code unfortunately.

But for the students sake, it is probably easy enough to try themselves.

I take the Resnet50 backbone, and replace the first convolutional layer which has 3 channel inputs, with a 4 channel input. I copy the weights from the red channel into the 4th, new uninitialized layer, and multiply all the weights by 3/4=0.75 to rescale them appropriately for the following unchanged layers.

I don't know if this approach is optimal, it is just simply copying the methodology I've seen done in other papers.