kvos / CoastSat

Global shoreline mapping tool from satellite imagery
http://coastsat.space
GNU General Public License v3.0
696 stars 252 forks source link

AI-generated code refactor #512

Closed thekester closed 4 months ago

thekester commented 4 months ago

IMAGEIO FFMPEG_WRITER WARNING: Resolution Adjustments and Data Alignment for MP4 Timelapse Animation

Hello, after attempting to execute the example.py file, I encountered warnings related to the MP4 timelapse animation. Here is how I resolved the warnings.

Warning Details

IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (2700, 1350) to (2704, 1360) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
[rawvideo @ 0x5b94500] Stream #0: not enough frames to estimate rate; consider increasing probesize.
[swscaler @ 0x5babfc0] Warning: data is not aligned! This can lead to a speed loss.

Solution

To solve the warning, import the required libraries and use the provided functions to resize the images and create the animation:

import cv2
import imageio
import os
import numpy as np
from PIL import Image

# Define functions
def resize_image_to_macro_block(image, macro_block_size=16):
    height, width = image.shape[:2]
    new_width = (width + macro_block_size - 1) // macro_block_size * macro_block_size
    new_height = (height + macro_block_size - 1) // macro_block_size * macro_block_size
    return cv2.resize(image, (new_width, new_height))

def load_and_resize_images(image_folder, macro_block_size=16):
    images = []
    for filename in sorted(os.listdir(image_folder)):
        if filename.endswith(".jpg"):
            img_path = os.path.join(image_folder, filename)
            image = np.array(Image.open(img_path))
            resized_image = resize_image_to_macro_block(image, macro_block_size)
            images.append(resized_image)
    return images

def create_animation(image_folder, output_file, fps=4, macro_block_size=16):
    images = load_and_resize_images(image_folder, macro_block_size)
    imageio.mimwrite(output_file, images, fps=fps)

# Define paths and settings
fn_animation = os.path.join(inputs['filepath'], inputs['sitename'], '%s_animation_RGB.mp4' % inputs['sitename'])
fp_images = os.path.join(inputs['filepath'], inputs['sitename'], 'jpg_files', 'preprocessed')
fps = 4
create_animation(fp_images, fn_animation, fps)

# Create MP4 timelapse animation
fn_animation = os.path.join(inputs['filepath'], inputs['sitename'], '%s_animation_shorelines.mp4' % inputs['sitename'])
fp_images = os.path.join(inputs['filepath'], inputs['sitename'], 'jpg_files', 'detection')
fps = 4
create_animation(fp_images, fn_animation, fps)

but a warning still remains [rawvideo @ 0x67e8680] Stream #0: not enough frames to estimate rate; consider increasing probesize

thekester commented 4 months ago

Additional Warning and Solution

While running the code, I also encountered the following warning:

AttributeError: '_tkinter.tkapp' object has no attribute 'showMaximized'

Here is how to fix it:


# mng.window.showMaximized()

# Maximize the window using tkinter method
mng.window.wm_attributes('-zoomed', True) ```

modification in SDS_shoreline.py and SDS_preprocess.py
thekester commented 4 months ago

Handling Time-Series Data and Tide Level Retrieval Errors

After solve the previous warning and error, i encountered several errors related to handling time-series data and retrieving tide levels. The key errors and their solutions are detailed below.

Errors and Solutions

1. Empty Time Series Dates (dates_ts)

Error:

ValueError: The time series dates (dates_ts) are empty.

Cause: This error occurs when the time series dates list (dates_ts) is empty.

Solution: Ensure that the dates_ts list is populated correctly. Add a check to raise an error if dates_ts is empty.

if not dates_ts:
    raise ValueError("The time series dates (dates_ts) are empty.")

2. Input Dates Not Covered by Time Series

Error:

Exception: Time-series do not cover the range of your input dates

Cause: This error occurs when the input dates (dates_sat) are not fully covered by the time series dates (dates_ts).

Solution: Adjust the input dates to fit within the available time series data range. Print the adjusted date range for verification.

if input_data_start < time_series_start or input_data_end > time_series_end:
    print("Sorry, the time series data does not cover the range of your input dates.")
    print(f"The available time series data ranges from {time_series_start} to {time_series_end}.")

    adjusted_dates_sat = [
        max(time_series_start, min(time_series_end, date)) for date in dates_sat
    ]

    if not adjusted_dates_sat:
        raise ValueError("The adjusted input dates are empty after adjustment.")

    print("Adjusting input dates to fit within the available time series data range:")
    print(f"Adjusted date range: {min(adjusted_dates_sat)} to {max(adjusted_dates_sat)}")
else:
    adjusted_dates_sat = dates_sat

3. Empty Result from get_closest_datapoint

Error:

Extracting closest points: 82% An error occurred: min() arg is an empty sequence

Cause: This error occurs if the get_closest_datapoint function fails to find any dates in dates_ts greater than or equal to the current date.

Solution: Improve the error handling in the get_closest_datapoint function to ensure that it handles cases where no matching dates are found.

def get_closest_datapoint(dates, dates_ts, values_ts):
    if dates[0] < dates_ts[0] or dates[-1] > dates_ts[-1]: 
        raise Exception('Time-series do not cover the range of your input dates')

    temp = []

    def find(item, lst):
        start = lst.index(item)
        return start

    for i, date in enumerate(dates):
        print('
Extracting closest points: %d%%' % int((i+1)*100/len(dates)), end='')
        try:
            closest_date = min(item for item in dates_ts if item >= date)
            index = find(closest_date, dates_ts)
            temp.append(values_ts[index])
        except ValueError:
            raise ValueError(f"No date in time series is greater than or equal to {date}")

    values = np.array(temp)
    return values

4. Ambiguous Boolean Array (tides_sat)

Error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Cause: This error occurs when attempting to evaluate a NumPy array as a boolean condition.

Solution: Use .size to check if the array is not empty before proceeding.

if tides_sat.size > 0:
    fig, ax = plt.subplots(1, 1, figsize=(15, 4), tight_layout=True)
    ax.grid(which='major', linestyle=':', color='0.5')
    ax.plot(dates_ts, tides_ts, '-', color='0.6', label='all time-series')
    ax.plot(adjusted_dates_sat, tides_sat, '-o', color='k', ms=6, mfc='w', lw=1, label='image acquisition')
    ax.set(ylabel='tide level [m]', xlim=[adjusted_dates_sat[0], adjusted_dates_sat[-1]], title='Water levels at the time of image acquisition')
    ax.legend()
    plt.show()
else:
    print("Tide levels for the input dates could not be plotted because tides_sat is empty.")

Example Code Snippet

# Ensure tides_sat is not empty before plotting or further processing
if tides_sat.size > 0:
    fig, ax = plt.subplots(1, 1, figsize=(15, 4), tight_layout=True)
    ax.grid(which='major', linestyle=':', color='0.5')
    ax.plot(dates_ts, tides_ts, '-', color='0.6', label='all time-series')
    ax.plot(adjusted_dates_sat, tides_sat, '-o', color='k', ms=6, mfc='w', lw=1, label='image acquisition')
    ax.set(ylabel='tide level [m]', xlim=[adjusted_dates_sat[0], adjusted_dates_sat[-1]], title='Water levels at the time of image acquisition')
    ax.legend()
    plt.show()
else:
    print("Tide levels for the input dates could not be plotted because tides_sat is empty.")

# Further processing only if tides_sat is not empty
if tides_sat.size > 0:
    try:
        reference_elevation = 0  # Example reference elevation
        beach_slope = 0.1  # Example beach slope
        correction = (tides_sat - reference_elevation) / beach_slope
        print("Correction values computed successfully.")
    except NameError as e:
        print(f"A NameError occurred: {e}. Did you mean: 'tide_data'?")
    except Exception as e:
        print(f"An error occurred while computing correction values: {e}")
else:
    print("Skipping correction computation because tides_sat is empty.")

# Tidal correction along each transect
reference_elevation = 0.7  # Example reference elevation
beach_slope = 0.1  # Example beach slope
cross_distance_tidally_corrected = {}
if tides_sat.size > 0:
    for key in cross_distance.keys():
        correction = (tides_sat - reference_elevation) / beach_slope
        cross_distance_tidally_corrected[key] = cross_distance[key] + correction
else:
    print("Skipping tidal correction along transects because tides_sat is empty.")
thekester commented 4 months ago

Handling Missing Narrabeen_Profiles.csv File

The error encountered here is when the Narrabeen_Profiles.csv file is not found because it needs to be downloaded. Below are the steps to handle this situation by providing user-friendly messages.

Problem Description

When trying to load the Narrabeen_Profiles.csv file using pandas.read_csv(), the script may encounter a FileNotFoundError if the file does not exist at the specified path. This can happen if the file has not been downloaded yet.

Error Message

FileNotFoundError: [Errno 2] No such file or directory: 'coastsat/CoastSat-master/examples/Narrabeen_Profiles.csv'

Solution

To handle this situation, we can use a try-except block to catch the FileNotFoundError and provide a message to the user on where to download the required CSV file.

Improved Code

import os
import pandas as pd
import numpy as np

# Define the path to the CSV file
fp_datasets = os.path.join(os.getcwd(), 'examples', 'Narrabeen_Profiles.csv')

# Try to read the CSV file and handle any errors that occur
try:
    df = pd.read_csv(fp_datasets)
    print("CSV file loaded successfully.")
except FileNotFoundError:
    print(f"Error: The file '{fp_datasets}' does not exist. Please ensure the file path is correct and the file is present.")
    print("You can download the Narrabeen data from http://narrabeen.wrl.unsw.edu.au/")
except pd.errors.EmptyDataError:
    print(f"Error: The file '{fp_datasets}' is empty. Please provide a valid CSV file.")
except pd.errors.ParserError:
    print(f"Error: There was an error parsing the file '{fp_datasets}'. Please ensure the file is a valid CSV.")
except Exception as e:
    print(f"An unexpected error occurred while trying to read the file '{fp_datasets}': {e}")
else:
    # If the CSV was loaded successfully, proceed with further processing
    pf_names = list(np.unique(df['Profile ID']))
    print(f"Profile IDs loaded: {pf_names}")

Explanation

  1. Explicit Error Handling: Each specific error type (FileNotFoundError, EmptyDataError, ParserError) is handled with an appropriate message.
  2. User Guidance: Provides a URL to download the required CSV file if it is missing.
  3. Further Processing: Only attempts to process the data (extract Profile IDs) if the CSV file is successfully loaded.

This approach ensures that users are well-informed about the need to download the necessary data if it is missing.

thekester commented 4 months ago

Improvement: Check and Save Preprocessed Image Files

In this little comment, I want to add a feature that checks if the preprocessed image files already exist in the specified directory before saving new ones. This ensures that we do not perform redundant operations and only save images if they are not already present.

Problem Description

When processing satellite images, it is essential to avoid redundant operations such as re-saving images that have already been preprocessed. This can save time and computational resources.

Solution

To handle this situation, we can use a function to check if the preprocessed image files already exist in the specified directory. If the files do not exist, the script will save the images; otherwise, it will skip this step.

Improved Code

import os

# Function to check if jpg files already exist
def check_files_exist(path, file_extension=".jpg"):
    if not os.path.exists(path):
        return False
    return any(file.endswith(file_extension) for file in os.listdir(path))

# Define the path to preprocessed jpg files in the current directory
preprocessed_path = f"./data/{sitename}/jpg_files/preprocessed"

# Check if directory exists and if files exist
if not os.path.exists(preprocessed_path):
    os.makedirs(preprocessed_path)
    print(f"Directory created: {preprocessed_path}")

if not check_files_exist(preprocessed_path):
    # Only save images if they don't already exist
    SDS_preprocess.save_jpg(metadata, settings, use_matplotlib=True)
    print("Satellite images saved as .jpg in", preprocessed_path)
else:
    print("Preprocessed jpg files already exist in", preprocessed_path)

Explanation

  1. Function to Check File Existence: The check_files_exist function checks if the specified directory exists and if any files with the given extension are present in the directory.
  2. Directory Creation: The script creates the directory if it does not already exist.
  3. Conditional Saving: The script saves the preprocessed images only if they do not already exist in the specified directory.

This improvement ensures that the script efficiently handles preprocessed image files, avoiding unnecessary reprocessing and saving of files.

thekester commented 4 months ago

Purpose : Authentication Script for Google Earth Engine

This script is designed to test the connection to Google Earth Engine (GEE) before running example Python scripts. It ensures that the authentication to GEE is successful, allowing further operations with GEE to proceed smoothly.

Before executing any Earth Engine scripts, it is crucial to verify that the authentication to Google Cloud is working correctly. This script handles the authentication and initialization process, providing feedback on whether the authentication was successful or not.

Script: authenticate.py

The following script attempts to authenticate and initialize the Earth Engine session:

import ee

def authenticate_and_initialize():
    try:
        # Authenticate the Earth Engine session.
        ee.Authenticate()
        # Initialize the Earth Engine module.
        ee.Initialize()
        print("Authentication successful!")
    except Exception as e:
        print(f"Authentication failed: {e}")

if __name__ == "__main__":
    authenticate_and_initialize()

Explanation

  1. Authentication: The ee.Authenticate() function prompts the user to authenticate their Earth Engine session. This step is required to gain access to Google Earth Engine resources.
  2. Initialization: The ee.Initialize() function initializes the Earth Engine library, allowing it to be used in subsequent operations.
  3. Error Handling: If authentication or initialization fails, an error message is printed, indicating the failure reason.

How to Use

  1. Run the Script: Execute the script by running python authenticate.py in your terminal or command prompt.
  2. Follow the Prompts: Complete the authentication process as prompted. You may need to log in to your Google account and grant necessary permissions.
  3. Check the Output: If authentication is successful, you will see the message "Authentication successful!". Otherwise, an error message will indicate the failure reason.

Benefits

Ensure that your Google Earth Engine operations are authenticated and initialized correctly, preventing potential issues with accessing Earth Engine resources.

kvos commented 4 months ago

Purpose : Authentication Script for Google Earth Engine

This script is designed to test the connection to Google Earth Engine (GEE) before running example Python scripts. It ensures that the authentication to GEE is successful, allowing further operations with GEE to proceed smoothly.

Before executing any Earth Engine scripts, it is crucial to verify that the authentication to Google Cloud is working correctly. This script handles the authentication and initialization process, providing feedback on whether the authentication was successful or not.

Script: authenticate.py

The following script attempts to authenticate and initialize the Earth Engine session:

import ee

def authenticate_and_initialize():
    try:
        # Authenticate the Earth Engine session.
        ee.Authenticate()
        # Initialize the Earth Engine module.
        ee.Initialize()
        print("Authentication successful!")
    except Exception as e:
        print(f"Authentication failed: {e}")

if __name__ == "__main__":
    authenticate_and_initialize()

Explanation

  1. Authentication: The ee.Authenticate() function prompts the user to authenticate their Earth Engine session. This step is required to gain access to Google Earth Engine resources.
  2. Initialization: The ee.Initialize() function initializes the Earth Engine library, allowing it to be used in subsequent operations.
  3. Error Handling: If authentication or initialization fails, an error message is printed, indicating the failure reason.

How to Use

  1. Run the Script: Execute the script by running python authenticate.py in your terminal or command prompt.
  2. Follow the Prompts: Complete the authentication process as prompted. You may need to log in to your Google account and grant necessary permissions.
  3. Check the Output: If authentication is successful, you will see the message "Authentication successful!". Otherwise, an error message will indicate the failure reason.

Benefits

  • Ensures Valid Authentication: Verifies that the connection to Google Earth Engine is established before running any example scripts.
  • Provides Feedback: Offers clear messages about the authentication status, helping users troubleshoot any issues.

Ensure that your Google Earth Engine operations are authenticated and initialized correctly, preventing potential issues with accessing Earth Engine resources.

Hi @thekester , sorry I didn't have time to review all the changes in your PR. The authentication part is a good improvement, could you please submit only that to a PR? and Integrate it into SDS_downmoad.py? I think it's the only module that requires GEE.

thekester commented 4 months ago

Hi @kvos

Thank you for your feedback! I have updated the PR to include only the authentication function integrated into SDS_download.py, as suggested.

https://github.com/kvos/CoastSat/pull/519

In the retrieve_images function within SDS_download.py, I replaced the ee.Initialize() call with authenticate_and_initialize() to ensure complete integration of the authentication process. so i include a new commit on my PR to do that.

kvos commented 4 months ago

great thanks for that