scipy-conference / scipy_proceedings

Tools used to generate the SciPy conference proceedings
Other
223 stars 512 forks source link

Paper: Python-Based GeoImagery Dataset Development for Deep Learning-Driven Forest Wildfire Detection #916

Open valemar10 opened 1 month ago

valemar10 commented 1 month ago

NOTE: The title we used in our accepted proposal was "Development and Application of CWGID: the California Wildfire GeoImaging Dataset for Deep Learning-Driven Forest Wildfire Detection". We changed it to "Python-Based GeoImagery Dataset Development for Deep Learning-Driven Forest Wildfire Detection" to make it more aligned with the focus of the SciPy conference, highlighting the use of Python tools for environmental monitoring. The proposal contents remain unchanged.

If you are creating this PR in order to submit a draft of your paper, please name your PR with Paper: <title>. An editor will then add a paper label and GitHub Actions will be run to check and build your paper.

See the project readme for more information.

Editor: Meghann Agarwal @mepa

Reviewers:

github-actions[bot] commented 1 month ago

Curvenote Preview

Directory Preview Checks Updated (UTC)
papers/Valeria_Martin 🔍 Inspect ✅ 58 checks passed (12 optional) Jul 15, 2024, 10:41 PM
hongsupshin commented 1 month ago

@valemar10 Thanks for mentioning the title changes :) Really appreciate it!

kafitzgerald commented 4 weeks ago

Hi @valemar10 :wave:! I'll be a reviewer for your SciPy 2024 proceedings submission. You can expect my initial review in the form of PR review here by June 29th or hopefully sooner.

valemar10 commented 3 weeks ago

Hi @kafitzgerald! I will look forward to seeing your review! Thank you very much for your help.

kalyan678 commented 3 weeks ago

@valemar10 - Hello, My name is Kalyan , a Data and AI scientist from India. With extensive experience in tackling machine learning and artificial intelligence challenges at the enterprise level across diverse domains, I am thrilled to review your paper for the SciPy Conference 2024. This is my second year in a row reviewing for the conference, and I look forward to offering valuable suggestions and inputs to refine your paper, ensuring it effectively reaches and benefits a broader audience.

I've just begin the work. You can see my initial review comments in a couple of days!

valemar10 commented 3 weeks ago

Hi @kalyan678!

Thank you for taking the time to review our paper for SciPy. We are happy to have your expertise and knowledge in machine learning and artificial intelligence to help us improve our work. We look forward to receiving your initial review comments.

Thank you! Valeria

kalyan678 commented 2 weeks ago

@valemar10 - I enjoyed reviewing your paper and work. Below are my detailed comments section by section.

Abstract:

Overall the abstract is good but can be improved. I think you can briefly describe the key steps in the methodology. You mentioned using Google Earth Engine’s Python API but adding more context on what steps were involved in building, labeling, processing, and evaluating the dataset could be helpful.

The current conclusion does a good job mentioning how the study improves environmental monitoring, but it can be clearer and more specific. For eg- The pipeline described in this paper shows how Python can be used to build and process high-resolution satellite imagery datasets, leading to accurate wildfire detection and offering a robust tool for broader environmental monitoring.

Introduction:

In my opinion, the introduction provides background but doesn’t clearly state the purpose of your study. I think you can clearly mention that the paper presents a Python-based methodology for creating and using a high-resolution satellite imagery dataset for forest wildfire detection. Wdyt?

Next, you mention how Google Earth Engine (GEE) and Python are useful, but it would be clearer if you directly connect them to solving the problem of creating and using satellite imagery datasets. Explain how GEE makes data collection and processing faster and easier, and how Python’s tools help handle the complex steps of developing and training deep learning models with this data. This will show how these technologies specifically help overcome the challenges you’re addressing.

Finally, instead of just focusing on wildfire detection, it would be great if you can also highlight how your method can be used for many environmental monitoring tasks. This shows how your approach can help with different areas of studying and reacting to changes in the environment. Again , I am happy to hear if you have any alternative thoughts on this.

Downloading the Imagery Data Using GEE’s Python API :

I think it is good to clarify terms like "bands B4, B3, and B2" by briefly explaining their significance in satellite imagery. Code 1- May be, you can include a short description before the code snippet, explaining its purpose. Overall the code is very well written! The code includes some comments but could benefit from more detailed explanations. Variable names are somewhat clear, but some could be more descriptive. For eg- point could be renamed to something interesting and descriptive.

Creating the Ground Truth Wildfire Labels:

The process is detailed and covers essential steps, such as accessing wildfire polygon data and rasterizing them onto satellite imagery. Good job!

Code 2- The code seems to be promising and very clear. I believe it is error free!

Image Segmentation and Data Preparation for Deep Learning Architectures:

The process is well-described, emphasizing the importance of maintaining spatial resolution and efficiency in deep learning model training. Good job!

Code 3- Make sure to add comments to explain what each part of your code does. Comments help others understand how your code works by explaining the purpose of each function and major section. This makes it easier for readers to follow along with your code and see how each step fits into the larger process. I believe metadata are accurately applied and tested because it is very crucial for maintaining spatial integrity when working with GeoTIFF files.

Data Augmentation for Wildfire Damage Detection:

The explanation about the dataset imbalance and the need for augmentation is clear and understandable. Awesome!! If possible add an example or visualization of an augmented tile before and after transformation could enhance clarity.

VGG16 Implementation:

I think it is good to explain how the GeoTIFF images are processed and fed into VGG16

Code 4- Can you explain how labels are assigned based on the presence of "/Damaged/" in the file paths.? Rest all code is fine and clear.

Good explanation of how VGG16 is initialized with pre-trained ImageNet weights and why the convolutional base is frozen. It clarifies that the pre-trained features are preserved while new layers focus on learning specific to wildfire detection. The choice of Adam optimizer is justified well, highlighting its adaptive learning rate which is crucial for satellite image classification where conditions can vary widely. Binary cross-entropy loss is appropriate for this binary classification task. Great job!! Awesome đź’Ż

Code 5- The code is pretty straightforward and aligns with the described approach. I am not sure whether you have faced any challenges during training or limitations of the approach, such as computational resources or specific complexities in satellite image data. If yes, it is good to mention them.

Code 6- Try adding more inline comments to explain complex parts of your code, especially where you're manipulating image data or handling batch processing. Can we test data generator with different batch sizes and datasets to ensure it handles varying conditions without errors?

Code 7- Training your model for 13 epochs over 933 minutes (about 15.5 hours) shows that it requires a lot of computing time. Can we make this more efficient? One way I can think here is adjusting the batch sizes of data processed at once.

Can we also evaluate model performance metrics (accuracy, precision, recall) on both training and validation sets to understand how well the model generalizes to new data and its robustness against overfitting.

After looking at the model performance metrics details, I feel like we should find ways to optimize VGG16's performance, such as adjusting hyperparameters, experimenting with different architectures, or exploring distributed training options to reduce training time.

Conclusion:

The conclusion effectively summarizes key findings and emphasizes the methodology's effectiveness. Additionally, mentioning future research directions like exploring FCNs adds completeness to the conclusion and encourages further discussion about how to expand the methodology's usefulness. Awesome!!

References:

All reference details are very clear and easy to navigate.

General suggestions:

Avoid or simplify technical terms where possible. Provide brief explanations for any unavoidable technical jargon. Please ensure a consistent narrative flow and structure throughout the sections for easier understanding and readability.

This concludes my initial review. Happy to clarify/ discuss if you have any questions on my comments. Good luck!

valemar10 commented 2 weeks ago

Hi @kalyan678! Thank you very much for taking the time to review my paper and provide your valuable feedback. I appreciate your detailed comments and suggestions. I will be working on addressing your comments this week to improve the quality of the paper. I am committed to making the revisions that are needed.

Thank you again for your time.

mepa commented 2 weeks ago

Thanks for your review, @kalyan678!

Hi @kafitzgerald, in case a little extra time is needed, the initial complete review deadline has been extended to next Wednesday, July 3rd.

mepa commented 1 week ago

Hi @kafitzgerald, the Proceedings Committee has extended the initial complete review deadline to Monday, July 8th. Thanks so much for volunteering to review and we look forward to seeing your comments!

valemar10 commented 2 hours ago

Hi @kalyan678 ,

I hope you're doing well.

I wanted to let you know that I have implemented your reviews. Thank you for everything. I appreciate your detailed feedback and support. Please let me know if there are any further changes or if you have any additional comments.

Could you please go ahead and check the paper?

Below are my responses to your kind comments.

ABSTRACT

REVIEW:

Overall the abstract is good but can be improved. I think you can briefly describe the key steps in the methodology. You mentioned using Google Earth Engine’s Python API but adding more context on what steps were involved in building, labeling, processing, and evaluating the dataset could be helpful. The current conclusion does a good job mentioning how the study improves environmental monitoring, but it can be clearer and more specific. For eg- The pipeline described in this paper shows how Python can be used to build and process high-resolution satellite imagery datasets, leading to accurate wildfire detection and offering a robust tool for broader environmental monitoring.

RESPONSE:

I added a more detailed description of the key steps in the methodology, including some of the libraries needed, and used the suggested specific conclusion.

INTRODUCTION

REVIEW:

In my opinion, the introduction provides background but doesn’t clearly state the purpose of your study. I think you can clearly mention that the paper presents a Python-based methodology for creating and using a high-resolution satellite imagery dataset for forest wildfire detection. Wdyt? Next, you mention how Google Earth Engine (GEE) and Python are useful, but it would be clearer if you directly connect them to solving the problem of creating and using satellite imagery datasets. Explain how GEE makes data collection and processing faster and easier, and how Python’s tools help handle the complex steps of developing and training deep learning models with this data. This will show how these technologies specifically help overcome the challenges you’re addressing. Finally, instead of just focusing on wildfire detection, it would be great if you can also highlight how your method can be used for many environmental monitoring tasks. This shows how your approach can help with different areas of studying and reacting to changes in the environment. Again, I am happy to hear if you have any alternative thoughts on this.

RESPONSE:

I agree, thank you for your valuable suggestions I have now stated the purpose of the study right from the beginning. I have elaborated on how Google Earth Engine (GEE) makes data collection and processing faster and easier. I have explained how Python’s tools help handle the steps of developing and training deep learning (DL) models with this data. Specifically, I mentioned libraries like TensorFlow for model training and libraries like Rasterio for processing satellite images. I have emphasized that while the methodology is applied to create and validate a high-resolution dataset for forest wildfire detection, it can also be adapted for various environmental monitoring tasks. I believe these revisions address the aspects of your review.

DOWNLOADING THE IMAGERY DATA USING GEE’S PYTHON API :

REVIEW:

I think it is good to clarify terms like "bands B4, B3, and B2" by briefly explaining their significance in satellite imagery. Code 1- May be, you can include a short description before the code snippet, explaining its purpose. Overall the code is very well written! The code includes some comments but could benefit from more detailed explanations. Variable names are somewhat clear, but some could be more descriptive. For eg- point could be renamed to something interesting and descriptive.

RESPONSE:

By incorporating the feedback, terms like "bands B4, B3, and B2" are clarified by briefly explaining their significance in satellite imagery. A short description is added before the code snippet, explaining its purpose. More detailed comments and more descriptive variable names are included in the code to enhance clarity and readability.

CREATING THE GROUND TRUTH WILDFIRE LABELS:

REVIEW:

The process is detailed and covers essential steps, such as accessing wildfire polygon data and rasterizing them onto satellite imagery. Good job! Code 2- The code seems to be promising and very clear. I believe it is error free!

RESPONSE:

Thank you!

IMAGE SEGMENTATION AND DATA PREPARATION FOR DEEP LEARNING ARCHITECTURES:

REVIEW:

The process is well-described, emphasizing the importance of maintaining spatial resolution and efficiency in deep learning model training. Good job! Code 3- Make sure to add comments to explain what each part of your code does. Comments help others understand how your code works by explaining the purpose of each function and major section. This makes it easier for readers to follow along with your code and see how each step fits into the larger process. I believe metadata are accurately applied and tested because it is very crucial for maintaining spatial integrity when working with GeoTIFF files.

RESPONSE:

Thank you! I commented Code 3.

DATA AUGMENTATION FOR WILDFIRE DAMAGE DETECTION:

REVIEW:

The explanation about the dataset imbalance and the need for augmentation is clear and understandable. Awesome!! If possible add an example or visualization of an augmented tile before and after transformation could enhance clarity.

RESPONSE:

Thank you! I added the example of the data augmentation.

VGG16 IMPLEMENTATION:

REVIEW:

I think it is good to explain how the GeoTIFF images are processed and fed into VGG16 Code 4- Can you explain how labels are assigned based on the presence of "/Damaged/" in the file paths.? Rest all code is fine and clear.

RESPONSE:

I added the explanation on how the labels are assigned.

REVIEW:

Good explanation of how VGG16 is initialized with pre-trained ImageNet weights and why the convolutional base is frozen. It clarifies that the pre-trained features are preserved while new layers focus on learning specific to wildfire detection. The choice of Adam optimizer is justified well, highlighting its adaptive learning rate which is crucial for satellite image classification where conditions can vary widely. Binary cross-entropy loss is appropriate for this binary classification task. Great job!! Awesome đź’Ż

RESPONSE:

Thank you!

Code 5-

The code is pretty straightforward and aligns with the described approach. I am not sure whether you have faced any challenges during training or limitations of the approach, such as computational resources or specific complexities in satellite image data. If yes, it is good to mention them.

RESPONSE:

The models were executed on a MacBook Pro (2019) equipped with a 2.4 GHz Intel Core i9 processor and 16 GB of 2400 MHz DDR4 memory, so we had limited computational resources. Satellite image data requires significant storage and processing power. Handling and processing such large datasets can be computationally intensive and time-consuming. Moreover, each image contains a large amount of data across multiple spectral bands and store geographic information that is not directly used by deep learning models but needs to be preserved. Thus, our current resources constrain the scale and speed of model training. All these explanations are now added to the paper.

Code 6-

Try adding more inline comments to explain complex parts of your code, especially where you're manipulating image data or handling batch processing. Can we test data generator with different batch sizes and datasets to ensure it handles varying conditions without errors?

RESPONSE:

I have added inline comments. I can test the data generator with different batches, but it will take me some time to access better computational resources through my university, I can test it with a smaller percentage of data for now.

EfficientNet Implementation

Code 7-

Training your model for 13 epochs over 933 minutes (about 15.5 hours) shows that it requires a lot of computing time. Can we make this more efficient? One way I can think here is adjusting the batch sizes of data processed at once.

RESPONSE:

It does require a lot of computing time; I can test different batch sizes and as soon as I have that I will have a better response.

REVIEW:

Can we also evaluate model performance metrics (accuracy, precision, recall) on both training and validation sets to understand how well the model generalizes to new data and its robustness against overfitting.

RESPONSE:

I do have this information and it is now added as a table in the paper for your review with the EfficientNetB0 model. Please let me know what you think.

REVIEW:

After looking at the model performance metrics details, I feel like we should find ways to optimize VGG16's performance, such as adjusting hyperparameters, experimenting with different architectures, or exploring distributed training options to reduce training time.

RESPONSE:

I agree, I am working on testing EfficientNet for this application as VGG16 is not performing as much as you’d expect, and it shows a higher degree of overfitting. As soon I have better results I will update the paper and let you know. I don’t have access to GPU’s or TPU’s at the moment, so the distributed training options is a step I have to consider further along.

CONCLUSION:

REVIEW 1:

The conclusion effectively summarizes key findings and emphasizes the methodology's effectiveness. Additionally, mentioning future research directions like exploring FCNs adds completeness to the conclusion and encourages further discussion about how to expand the methodology's usefulness. Awesome!!

RESPONSE:

Great!

REFERENCES:

REVIEW:

All reference details are very clear and easy to navigate.

RESPONSE:

Thanks!

GENERAL SUGGESTIONS:

REVIEW:

Avoid or simplify technical terms where possible. Provide brief explanations for any unavoidable technical jargon. Please ensure a consistent narrative flow and structure throughout the sections for easier understanding and readability. This concludes my initial review. Happy to clarify/ discuss if you have any questions on my comments. Good luck!

RESPONSE:

I have provided more explanations! Thank you for your help.

valemar10 commented 1 hour ago

Hi @kafitzgerald !

Thank you very much for your detailed review and valuable feedback. I appreciate your focus on the geospatial data and remote sensing aspects, as well as your general comments. Your insights have been helpful.

I understand the importance of clear language and have worked on improving the clarity in the abstract, introduction, discussion, and conclusion sections. I have also addressed the specific comments you made to better convey the methodology, impact, and choices made in my work.

You can now see the changes I’ve made based on your reviews in the paper. I have also replied to each of your comments above.

Thanks!