voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.84k stars 558 forks source link

[BUG] GeoTIFFDataset returns CRSError: Invalid projection: : (Internal Proj Error: proj_create: unrecognized format / unknown name) #4756

Open robmarkcole opened 2 months ago

robmarkcole commented 2 months ago

Describe the problem

Geotiffs are in EPSG:32648 and this raises an error

Code to reproduce issue

name = "data-v1"
dataset_dir = "data"

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.GeoTIFFDataset,
    label_field="location",
    name=name,
)

System information

Other info/logs

---------------------------------------------------------------------------
CRSError                                  Traceback (most recent call last)
Cell In[3], line 2
      1 # Create the dataset
----> 2 dataset = fo.Dataset.from_dir(
      3     dataset_dir=dataset_dir,
      4     dataset_type=fo.types.GeoTIFFDataset,
      5     label_field="location",
      6     name=name,
      7 )
      9 # View summary info about the dataset
     10 print(dataset)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/fiftyone/core/dataset.py:5354, in Dataset.from_dir(cls, dataset_dir, dataset_type, data_path, labels_path, name, persistent, overwrite, label_field, tags, dynamic, **kwargs)
   5260 """Creates a :class:`Dataset` from the contents of the given directory.
   5261 
   5262 You can create datasets with this method via the following basic
   (...)
   5351     a :class:`Dataset`
   5352 """
   5353 dataset = cls(name, persistent=persistent, overwrite=overwrite)
-> 5354 dataset.add_dir(
   5355     dataset_dir=dataset_dir,
   5356     dataset_type=dataset_type,
   5357     data_path=data_path,
...
--> 348     self._local.crs = _CRS(self.srs)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pyproj/_crs.pyx:2378, in pyproj._crs._CRS.__init__()

CRSError: Invalid projection: : (Internal Proj Error: proj_create: unrecognized format / unknown name)

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

swheaton commented 2 months ago

This seems like an issue potentially unique to your data? It will be hard to debug without an example. Can you provide a small, non-sensitive test image that reproduces your issue?

At the very least, please paste the whole stack trace as it seems to be chopped off in the middle. As it stands, it seems like rasterio or pyproj just doesn't support that CRS? Hard to tell.

robmarkcole commented 2 months ago

example_32648.tif.zip

I've no issue opening with rasterio

swheaton commented 2 months ago

Thank you. I am not able to reproduce with your example however.

>>> import fiftyone as fo
>>> ds=fo.Dataset.from_dir("data", dataset_type=fo.types.GeoTIFFDataset, label_field="location")
 100% |███████████████████████████████████████████████████████████████████████████████████| 1/1 [269.5ms elapsed, 0s remaining, 3.7 samples/s]
>>> ds.first().location
<GeoLocation: {
    'id': '66d1cf90bf68b63e4b2b7aff',
    'tags': [],
    'point': [104.06648594333119, 1.2362626908143242],
    'line': None,
    'polygon': [
        [
            [104.06303347762334, 1.2397351518883317],
            [104.06993598244, 1.2397376011299024],
            [104.06993840007095, 1.2327902182699888],
            [104.06303591319048, 1.2327877827578355],
            [104.06303347762334, 1.2397351518883317],
        ],
    ],
}>

My package versions (fresh environment pip install just now)

fiftyone==0.22.1
pyproj==3.6.1
rasterio==1.3.10
robmarkcole commented 2 months ago

OK the issue is there are also png files in that folder

swheaton commented 2 months ago

Got it yeah that won't work, it doesn't check file extension before trying to open the file. Can we close this? 🙌🏼

robmarkcole commented 2 months ago

I feel it would be good to at least raise a warning if a non tif is opened

swheaton commented 2 months ago

Ok would you like to submit a PR proposal?

One idea that is idiomatic with other fiftyone methods is an argument to the GeoTIFFImporter skip_failures=True. If True, it would just completely ignore the file and not add that sample. If False then an exception is raised. We could wrap the exception you saw because whatever came out of rasterio/pyproj was obviously not helpful.

robmarkcole commented 2 months ago

Sounds excellent, happy to take on but no promises on timeframe