Instead of printing objecthash, print the type properly, ie StorageType.Type(local)
👆 thats it
Is this the end state of refactoring? NO. The goal of the PR is limited to removing fsStorageProvider. In future PRs we'll address CTAS, fileSecurer, stripPrefix, defaultTableLocation etc, currently these have been tagged with TODO
Changes
[X] Refactoring
[X] Tests
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
[ ✅ ] Manually Tested on local docker setup. Please include commands ran, and their output.
Audit events for these tests:
{"eventTimestamp":"1715644331261","clusterName":"LocalHadoopCluster","databaseName":"db","tableName":"tb","user":"openhouse","operationType":"CREATE","operationStatus":"SUCCESS","currentTableRoot":"hdfs://namenode:9000/data/openhouse/db/tb-97df14b9-752e-4b7f-b540-5058c784994c/00000-d4aca2a1-c8bc-40eb-8449-5b70d2e24598.metadata.json","grantor":null,"grantee":null,"role":null}
- [ ✅ ] Added new tests for the changes made.
- [ ✅ ] Updated existing tests to reflect the changes made.
- [ ✅ ] Some other form of testing like staging or soak time in production. Please explain.
# Additional Information
- [X] Deprecations
- remove fsStorageProvider
- [X] Large PR broken into smaller PRs, and PR plan linked in the description.
- https://github.com/linkedin/openhouse/pull/100
Summary
❗❗ The volume of this PR might be intimidating, please read through the description to make reviewing easier.
The primary goal of this PR is to remove
FsStorageProvider
from:services:tables
(and additionally:tables-test-fixture
)To achieve this goal, we do: 1: Add additional methods to
StorageClient
interface2: Add additional method to
FileIOManager
to reverse lookup a Storage for a FileIO3: Map FsStorageProvider methods to StorageManager
4: Fix log for StorageType
👆 thats it
Is this the end state of refactoring? NO. The goal of the PR is limited to removing
fsStorageProvider
. In future PRs we'll address CTAS, fileSecurer, stripPrefix, defaultTableLocation etc, currently these have been tagged withTODO
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
scala> spark.sql("INSERT INTO TABLE openhouse.db.tb VALUES (current_timestamp(), 'val1', 'val2')") res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("SELECT * FROM openhouse.db.tb").show() +--------------------+----+----+ | ts|col1|col2| +--------------------+----+----+ |2024-05-13 23:52:...|val1|val2| +--------------------+----+----+
scala> spark.sql("CREATE TABLE openhouse.db.tb0 AS SELECT 1").show() ++ || ++ ++
scala> spark.sql("SELECT * FROM openhouse.db.tb0").show() +---+ | 1| +---+ | 1| +---+
Audit events for these tests: {"eventTimestamp":"1715644331261","clusterName":"LocalHadoopCluster","databaseName":"db","tableName":"tb","user":"openhouse","operationType":"CREATE","operationStatus":"SUCCESS","currentTableRoot":"hdfs://namenode:9000/data/openhouse/db/tb-97df14b9-752e-4b7f-b540-5058c784994c/00000-d4aca2a1-c8bc-40eb-8449-5b70d2e24598.metadata.json","grantor":null,"grantee":null,"role":null}
{"eventTimestamp":"1715644370032","clusterName":"LocalHadoopCluster","databaseName":"db","tableName":"tb","user":"openhouse","operationType":"COMMIT","operationStatus":"SUCCESS","currentTableRoot":"hdfs://namenode:9000/data/openhouse/db/tb-97df14b9-752e-4b7f-b540-5058c784994c/00001-54ef9c91-4747-4852-ba5b-adfb0b4f6fb3.metadata.json","grantor":null,"grantee":null,"role":null}
{"eventTimestamp":"1715644425694","clusterName":"LocalHadoopCluster","databaseName":"db","tableName":"tb","user":"openhouse","operationType":"READ","operationStatus":"SUCCESS","currentTableRoot":"hdfs://namenode:9000/data/openhouse/db/tb-97df14b9-752e-4b7f-b540-5058c784994c/00001-54ef9c91-4747-4852-ba5b-adfb0b4f6fb3.metadata.json","grantor":null,"grantee":null,"role":null}