Open bsweger opened 6 months ago
This points to a larger issue with the hubverse-transform unit tests: they rely heavily on local read/write operations because the code uses PyArrow FS, and you can't instantiate the pyarrow.fs.S3FileSystem
class against an S3 bucket that doesn't exist (even if you don't plan to read from it or write to it).
Using local operations in lieu of integration tests against a true mocked AWS/S3 environment has caused us to miss at least one bug related to S3 file handling, so it might now be time to rethink our approach. We can't use moto (only works when the code base uses boto to access AWS), but maybe moto server? localstack? minio?
Background
Thanks to @lshandross, we have some output from the hubverse-transform's test suite that indicates our path handling isn't working well on a Windows machine (see the attached file on this PR comment)
There are two underlying reasons for this:
hubverse-transform should have better cross-platform support, but because it is designed for cloud-based operations, let's start by fixing the second item: making the test suite work on Windows machines. This ensures that Windows-based devs can contribute to the project.
As for the first item, local operations exist as a side-effect rather than a fully-formed feature, so it's not worth spending a ton of time here until there's an actual feature request.
Definition of done