Closed brl0 closed 1 year ago
Added tests which fail on main
with these errors:
FAILED spatialpandas/tests/test_parquet_s3.py::TestS3ParquetDask::test_read_parquet_dask_remote_glob_parquet - pyarrow.lib.ArrowInvalid: Missing bucket name in S3 URI
FAILED spatialpandas/tests/test_parquet_s3.py::TestS3ParquetDask::test_read_parquet_dask_remote_glob_all - pyarrow.lib.ArrowInvalid: Missing bucket name in S3 URI
FAILED spatialpandas/tests/test_parquet_s3.py::TestS3ParquetDask::test_read_parquet_dask_remote_dir - pyarrow.lib.ArrowInvalid: GetFileInfo() yielded path 'test_bucket/test_dask/part.1.parquet', which is outside base dir 's3://test_bucket/test_dask'
FAILED spatialpandas/tests/test_parquet_s3.py::TestS3ParquetDask::test_read_parquet_dask_remote_dir_slash - pyarrow.lib.ArrowInvalid: GetFileInfo() yielded path 'test_bucket/test_dask/part.1.parquet', which is outside base dir 's3://test_bucket/test_dask/'
I believe this PR should be ready for review. Let me know if there are any questions or concerns, any feedback is always appreciated.
@ianthomas23, thanks for reviewing. I have made the requested changes.
Currently, one of the tests has a failure due to a 503 from GitHub, but that can be rerun once the other tests are complete.
If compatibility with those versions of pyarrow is expected, it seems like it would be worth parameterizing just the parquet tests over the supported versions, but given the run time of the full test suite, it does seem unnecessary to do so with the full test suite.
If compatibility with those versions of pyarrow is expected, it seems like it would be worth parameterizing just the parquet tests over the supported versions, but given the run time of the full test suite, it does seem unnecessary to do so with the full test suite.
I agree, but I don't want to make any changes at the moment. There has been a long tail of support for old python and pyarrow versions, and we are starting to bring that back to something sensible, e.g. dropping python 3.7 now and 3.8 will not be too far behind, and there will be corresponding dropping of pyarrow version support. We are also working on an overhaul of the project build and CI scripts on all holoviz repositories, and I expect we will end up with a "minimal versions" CI run to test that CI passes using all of the important minimum versions of dependencies.
I noticed some of the path handling for parquet files would munge paths for remote filesystems and confirmed after testing with s3. This PR fixes that issue.