Zero predictions after retraining the model with annotated data

thisistaimur commented 1 year ago

Describe the bug I tried to retrain the deepforest model with hand-annotated data, exactly as described in the documentation. The retraining works (10 epochs) but then when I run the predictions on a .tif file (same as the one used for hand annotations) then the model run produces zero predictions. Any idea why (see code below)? Some help would be much appreciated!

To Reproduce

#Retrain the model with annotated data
labels = get_data("/content/327045708_labels_2.csv")

m = main.deepforest()
m.to("cpu")
#m.config['gpus'] = '-1' #move to GPU and use all the GPU resources
m.config["train"]["fast_dev_run"] = False
m.config["train"]["epochs"] = 10
m.config["epochs"] = 10
#m.config["workers"] = 4
m.config["save-snapshot"] = False
m.config["train"]["csv_file"] = labels
m.config["train"]["root_dir"] = os.path.dirname(labels)

m.config["devices"] = "1"

m.create_trainer()
m.trainer.fit(m)

#Get a single tif tile and run deepforest
raster_path = "/content/327045708_rgb.tif"
# Window size of 300px with an overlap of 25% among windows for this small tile.
predicted_raster = m.predict_tile(raster_path=raster_path, return_plot = True, patch_size=1300,patch_overlap=0.25)

Environment (please complete the following information):

OS: Google Colab
Python version and environment : Python 3

Screenshots

Retrain Screenshot 2023-06-06 at 14 19 33

Predict Screenshot 2023-06-06 at 14 19 42

bw4sz commented 1 year ago

Happy to help, but we'll need to work through this together.

1) DeepForest version 2) Show me a test image with the baseline model, attach image here 3) Attach the csv annotations here, i'm going to guess they are malformed. 4) Visualize the training curves https://lightning.ai/docs/pytorch/stable/visualize/logging_basic.html

That will help us start.

thisistaimur commented 1 year ago

Thanks bw4sz! And yes of course.

Deepforest version: '1.2.7'
+3. I am using a .tif file (tile) as a baseline image. Both the .tif tile and the CSV annotations can be found here: https://we.tl/t-gko85eZ0sy
I will get back with this soon as I set up the Lightening Pytorch.

thisistaimur commented 1 year ago

Hi ben, I was not able to run tensorboard on our HPC cluster (where I am training the model) due the Juypterlab server being only accessible via the browser. Wanted to circle around whether @bw4sz if you had a chance to look into the issue?

thisistaimur commented 1 year ago

Sorry it took a while but here is the tensor board image after training runs. Doesn't seem to be any training happening under the hood. I have put together a Google Colab with my code to share if that helps?

Screenshot 2023-06-28 at 13 39 31

bw4sz commented 1 year ago

Can you retrigger the transfer above, I'm back working on DeepForest and happy to help. Let's make a toy example with your code and get it overfitting on one image to make sure everything works.

thisistaimur commented 1 year ago

I was actually able to train the model in the end and tensor board also showed the metrics. However, the model's predictions actually worsened instead of improving (see below). I gave up at that point. Here are a couple of tiles and the corresponding CSVs with the annotations again: https://we.tl/t-vd8U6Le4cn (the link is valid for a week)

bw4sz commented 1 year ago

I'm not following, can you perhaps share the full code you used? A quick check looks okay, I made the patch size a bit smaller. I suspect you plotted incorrectly? Just guessing.

from deepforest import main
from deepforest.utilities import boxes_to_shapefile
from matplotlib import pyplot

m = main.deepforest()
m.config["devices"] =1
m.create_trainer() 
m.use_release()
predictions = m.predict_tile(
    raster_path="/blue/ewhite/DeepForest/issue_454/327105702.tif", patch_size=400)
gdf = boxes_to_shapefile(predictions, root_dir="/blue/ewhite/DeepForest/issue_454/")
gdf.to_file("/blue/ewhite/DeepForest/issue_454/327105702_predictions.shp")

This is the baseline model, let's agree on this and then proceed to training.

thisistaimur commented 1 year ago

Hi there, thanks a lot for looking into this. I will be back from my holidays starting Monday, but as far as the training code above goes, my code has been pretty much the same (see below) and it works for me too. However it seems that baseline model yields better results than a trained model with lots of new contextual data shown to it. That's what I tried to say with my earlier post. Any ideas why that could be?


annotations_file = get_data("327045708_labels.csv")
logger = TensorBoardLogger(save_dir="lightning_logs/")

model = main.deepforest()
model.use_release()
model.config['gpus'] = '-1'
#model.config["workers"] = 4
model.config["train"]['epochs']= 5
model.config["score_thresh"] = 0.3
model.config["nms_thresh"] = 0.05
model.config["save-snapshot"] = False
model.config["train"]["csv_file"] = annotations_file
model.config["train"]["root_dir"] = os.path.dirname(annotations_file)

#Create model trainer and fit model
model.create_trainer(logger=logger, log_every_n_steps=1)
model.trainer.fit(model)

#Get a single tif tile and run deepforest
path = "327045708_rgb.tif"
# Window size of 400px with an overlap of 25% among windows for this small tile.
predicted_image = model.predict_tile(raster_path=path, return_plot = True, patch_size=400,patch_overlap=0.25)

bw4sz commented 9 months ago

@thisistaimur this issue got dropped. Where you able to solve this, let's pick this back up.

ethanwhite commented 7 months ago

Closing as stale.

@thisistaimur - apologies for this getting dropped initially. feel free to reopen if you come back to this.

weecology / DeepForest

Zero predictions after retraining the model with annotated data #454