microsoft / farmvibes-ai

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
https://microsoft.github.io/farmvibes-ai/
MIT License
686 stars 119 forks source link

Unable to run weed detection #177

Open Zihonglee opened 4 months ago

Zihonglee commented 4 months ago

In which step did you encounter the bug?

FarmVibes.AI setup

Are you using a local or a remote (AKS) FarmVibes.AI cluster?

Local cluster

Bug description

I have set up my weed detection environment and a server providing access to my local files using a URL so that it can download both geojson and raster images. However, when I execute the code, I can get the workflow to unpack and download, but not running the weed detection model successfully. I looked a little into the code and noticed that the output function is returning None since the status return failed. Apart from that, I am getting an error of "ValueError: Must pass either crs or epsg." I am not entirely sure what is missing in between. Any help would be greatly appreciated

Steps to reproduce the problem

  1. Activate environment $micromamba activate weed_detection
  2. Run a server to provide a url for downloading files $ python3 -m http.server
  3. Provide the information to the notebook
    1. replace url in the notebook with "http://10.42.0.1:8000/testfarmvibe/weed.png"
    2. replace boundary_shape_file in the nodebook with "http://10.42.0.1:8000/testfarmvibe/sensor_farm_boundary.geojson"
  4. Run the notebook

weed.png (I am not sure if this can cause an issue to the model) weed sensor_farm_boundary.geojson (took the geojson from here to test) https://github.com/microsoft/farmvibes-ai/blob/main/notebooks/heatmaps/sensor_farm_boundary.geojson

acrown-msft commented 4 months ago

The weed detection workflow requires the input image to be georeferenced. The output is a set of shapefiles, so the workflow needs to be able to map each pixel to a location. The bounding geometry will also be used to crop the parts of the image used in the workflow so you need to ensure the locations overlap at some point.

If you'd like to test the workflow out with this image, you can convert the png file to a tif with a mock geometry using the GDAL library and generate a bounding shapefile with GeoPandas.

Zihonglee commented 4 months ago

Hi Alex,

Thanks for the suggestion. I have tried the method above to generate a bounding shapefile for the image. Below is a detailed scenario of processing my image and generating a shapefile.

I have created both files (convertor.py), which converts a PNG file to a tif file, and (convert_tif_shp.py), which converts a tif file into a shp file. I have also used the same image as shown above for this workflow.

convertor.py

from osgeo import gdal

input_file = "./weed.jpg"
output_file = "./weed.tif"

try:
    ds = gdal.Open(input_file)
    if ds is None:
        raise Exception(f"Failed to open {input_file}")
    gt = gdal.Translate(output_file, ds,
                        outputBounds=[-117.04671718692339, 47.03455818199527, -117.04260145498948, 47.03632996899851],
                        outputSRS="EPSG:4326"
    )
    gt = None
    print(f"Conversion successful. Output file saved as {output_file}")
except Exception as e:
    print(e)

In the convertor.py file, the outputBounds' value I gave is exactly the same as the geojson file from "https://github.com/microsoft/farmvibes-ai/blob/main/notebooks/heatmaps/sensor_farm_boundary.geojson".

convert_tif_shp.py

import geopandas as gpd
import rasterio
from shapely.geometry import box

tiff_file = "./weed.tif"
output_file = "./bounding.shp"

with rasterio.open(tiff_file) as src:
    left, bottom, right, top = src.bounds
    print(f"left: {left}, r: {right}, bottom: {bottom}, t: {top}")

bbox_polygon = box(left, bottom, right, top)

crs = src.crs.to_string()
gdf = gpd.GeoDataFrame({'geometry': [bbox_polygon]}, crs=crs)

gdf.to_file(output_file)

After generating my shapefile, I parse it in the location where it can be downloaded (http://10.42.0.1:8000/testfarmvibe/bounding.shp) to the URL parameter. However, this time I am getting a different error "ValueError: Could not find raster asset in asset list: [AssetVibe(type=None, id='cdc5cc486fedabc761bd5ecacc440832ed7ae4c84c68ee44f812e2eb96476cb7', path_or_url='/mnt/data/assets/cdc5cc486fedabc761bd5ecacc440832ed7ae4c84c68ee44f812e2eb96476cb7/bounding_box.shp', _is_local=True, _local_path='/mnt/data/assets/cdc5cc486fedabc761bd5ecacc440832ed7ae4c84c68ee44f812e2eb96476cb7/bounding_box.shp')]."

Is it possible to provide some examples like shapefile or script that helps us to generate a shapefile out of an image if the above code is not working correctly. Thanks again for all the help.

acrown-msft commented 4 months ago

This worked for me given the posted image. Let me know if you still have issues running the workflow.

from osgeo import gdal, osr
from shapely.geometry import Polygon
import geopandas as gpd

def convert_png_to_geotiff(png_path, output_path, transform):
    """Converts a png to a geotiff given a geotransform"""
    # Get the number of bands, xsize, and ysize from the PNG file
    src_ds = gdal.Open(png_path)
    bands = src_ds.RasterCount
    xsize = src_ds.RasterXSize
    ysize = src_ds.RasterYSize

    # Create a new TIFF file
    dst_ds = gdal.GetDriverByName('GTiff').Create(output_path, xsize, ysize, bands, gdal.GDT_Byte)

    # Set the geotransform and projection
    dst_ds.SetGeoTransform(transform)
    srs = osr.SpatialReference()
    srs.ImportFromEPSG(4326)  # WGS84
    dst_ds.SetProjection(srs.ExportToWkt())

    # Write to file
    for i in range(bands):
        band = src_ds.GetRasterBand(i + 1)
        data = band.ReadAsArray()
        dst_ds.GetRasterBand(i + 1).WriteArray(data)

    # Close the datasets
    src_ds = None
    dst_ds = None

def create_shapefile(tiff_path, shapefile_path):
    """Creates a shapefile bounding a geotiff"""
    ds = gdal.Open(tiff_path)

    # Get the bounding box as a Shapely Polygon
    gt = ds.GetGeoTransform()
    ulx = gt[0]
    uly = gt[3]
    lrx = ulx + ds.RasterXSize * gt[1]
    lry = uly + ds.RasterYSize * gt[5]
    bounding_box = Polygon([(ulx, uly), (lrx, uly), (lrx, lry), (ulx, lry)])

    # Close the dataset
    ds = None

    # Save the GeoDataFrame as a shapefile
    gdf = gpd.GeoDataFrame(index=[0], crs="EPSG:4326", geometry=[bounding_box])
    gdf.to_file(shapefile_path)

if __name__ == "__main__":
    png_file = "Untitled.png"
    tiff_file = "output.tif"
    shp_file = "output.shp"
    transform = [0, 1, 0, 0, 0, -1]  # This is a mock geotransform
    convert_png_to_geotiff(png_file, tiff_file, transform)
    create_shapefile(tiff_file, shp_file)
Zihonglee commented 4 months ago

Hi Alex,

After giving the above code a try, I am getting a different error (RuntimeError: Failed to run op weed_detection in workflow run id c58f031a-5864-4c34-b13b-22586d62b747 for input with message id 00-c58f031a58644c34b13b22586d62b747-c1a11fe835ffc40d-01. Error description: <class 'RuntimeError'>: ValueError('Input shapes do not overlap raster.') ValueError: Input shapes do not overlap raster.)

Would it be possible if you can share an example of your geojson file? Just curious, is this where u get your geojson file (https://github.com/microsoft/farmvibes-ai/blob/main/notebooks/heatmaps/sensor_farm_boundary.geojson)?

Best, Vincent

acrown-msft commented 4 months ago

I did not use a geojson file; I used the shapefile output from the above script as the boundary for the weed detection workflow.

Zihonglee commented 4 months ago

Below is the weed.py similar to the notebook example.

from datetime import datetime
from fiona.crs import to_string
import geopandas as gpd
from shapely import geometry as shpg
from vibe_core.client import get_default_vibe_client
from vibe_core.data import ExternalReferenceList

client = get_default_vibe_client()
boundary_shape_file = "http://10.42.0.1:8000/testfarmvibe/micro_help/output.shp"
now = datetime.now()
data_frame = gpd.read_file(boundary_shape_file).to_crs("4326")
assert data_frame is not None
geometry = shpg.mapping(data_frame.geometry.iloc[0])
inputs = ExternalReferenceList(id=url_hash, time_range=(now, now), geometry=geometry, assets=[], urls=[])
params = {"bands": [], "alpha_index": -1, "simplify": "none"}
try:
    run = client.run(workflow='farm_ai/agriculture/weed_detection', name="weed_detection_example", input_data=inputs, parameters=params)
    run.monitor()
except Exception as e:
    print(e)
output = run.output
dv = output['result'][0]
asset = dv.assets[0]
archive_path = asset.path_or_url

After running the above workflow, with the shapefile generated by the code you shared. I am getting the following error.

Traceback (most recent call last):
  File "fiona/ogrext.pyx", line 136, in fiona.ogrext.gdal_open_vector
  File "fiona/_err.pyx", line 291, in fiona._err.exc_wrap_pointer
fiona._err.CPLE_OpenFailedError: '/vsimem/c3558058c97c4a7881b67db42f46f6fb' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_weed.py", line 19, in <module>
    data_frame = gpd.read_file(boundary_shape_file).to_crs("4326")
  File "/home/ara/.local/lib/python3.8/site-packages/geopandas/io/file.py", line 281, in _read_file
    return _read_file_fiona(
  File "/home/ara/.local/lib/python3.8/site-packages/geopandas/io/file.py", line 322, in _read_file_fiona
    with reader(path_or_bytes, **kwargs) as features:
  File "/home/ara/.local/lib/python3.8/site-packages/fiona/collection.py", line 783, in __init__
    super().__init__(self.virtual_file, vsi=filetype, **kwds)
  File "/home/ara/.local/lib/python3.8/site-packages/fiona/collection.py", line 243, in __init__
    self.session.start(self, **kwargs)
  File "fiona/ogrext.pyx", line 588, in fiona.ogrext.Session.start
  File "fiona/ogrext.pyx", line 143, in fiona.ogrext.gdal_open_vector
fiona.errors.DriverError: '/vsimem/c3558058c97c4a7881b67db42f46f6fb' not recognized as a supported file format.

Apart from the above error, If I am generating my own shape file, do I need a url parse in the notebook? Would you mind sharing your workflow file so that I can compare and understand better?

Best, Vincent

acrown-msft commented 3 months ago

It looks like you're trying to get GeoPandas to read the remote file. You can download the relevant files to the local machine.

import requests

# URLs for all necessary Shapefile components
base_url = "http://10.42.0.1:8000/testfarmvibe/micro_help/"
files = ["output.shp", "output.shx", "output.dbf"]

# Download and save each file locally
for file in files:
    response = requests.get(base_url + file)
    with open(file, "wb") as f:
        f.write(response.content)

# Run rest of the notebook 
data_frame = gpd.read_file("output.shp").to_crs("EPSG:4326")
...
Zihonglee commented 3 months ago

After making gpd to read the output.shp locally, I am getting the following issue.

Could not find raster asset in asset list: {self.assets}

I just realized that, I need a url hash as an input to the workflow. Below is the updated of my workflow

from datetime import datetime
from fiona.crs import to_string
import geopandas as gpd
import requests
from shapely import geometry as shpg

from vibe_core.client import get_default_vibe_client
from vibe_core.data import ExternalReferenceList

client = get_default_vibe_client()
base_url = "http://10.24.102.66:8000/testfarmvibe/micro_help/"
files = ["output.shp", "output.shx", "output.dbf"]

# Download and save each file locally                                                                                                                                                         
for file in files:
    response = requests.get(base_url + file)
    with open(file, "wb") as f:
        f.write(response.content)

now = datetime.now()                                                                                                                               
data_frame = gpd.read_file("output.shp")
data_frame.crs = "epsg:4326"
data_frame.to_crs(epsg=4326)
assert data_frame is not None
geometry = shpg.mapping(data_frame.geometry.iloc[0])
url_hash = str(hash(base_url + files[0]))
inputs = ExternalReferenceList(id=url_hash, time_range=(now, now), geometry=geometry, assets=[], urls=[base_url + files[0]])
params = {"bands": [], "alpha_index": -1, "simplify": "none"}
try:
    run = client.run(workflow='farm_ai/agriculture/weed_detection', name="weed_detection_example", input_data=inputs, parameters=params)
    run.monitor()
except Exception as e:
    print(e)
output = run.output
dv = output['result'][0]
asset = dv.assets[0]
archive_path = asset.path_or_url
acrown-msft commented 3 months ago

The urls member of the ExternalReferenceList should contain the location of the raster. In your code, this parameter should be something more like urls=[base_url + "img.tif"].