Introduce FileIOManager and FileIO implementations for HDFS and Local Storage

HotSushi commented 5 months ago

Summary

Laying foundations for storage part 4: FileIOManager and FileIO implementations for HDFS and Local

FileIOManager interface looks like:

FileIOManager {
   FileIO getFileIO(Type)
}

This interface is accompanied by ConfigureFileIO which sets up FileIOs for all "configured" storages.

We do not replace the existing FileIO instances to ensure production systems do not break.

To learn the motivation behind these changes please see this doc

What's the next plan

1) Deploy new services with new + old cluster yaml (along with new fileIOs and old fileIOs)

- storages
    - newconfs
- storage
    - oldconfs

2) Make refactors/remove old usage safely (remove old fileIOs and use new fileIOs) 3) Switch to new cluster yaml completely.

- storages
    - newconfs

Changes

[X] New Features Other related PRs: https://github.com/linkedin/openhouse/pull/90 https://github.com/linkedin/openhouse/pull/82 https://github.com/linkedin/openhouse/pull/76

Testing Done

[X] Added new tests for the changes made.

[X] docker tests, we change cluster.yaml in docker setup and tested the server boot with old and new config


/infra/recipes/docker-compose/oh-hadoop-spark> docker compose up -d
/infra/recipes/docker-compose/oh-hadoop-spark> docker exec -it local.spark-master /bin/bash

scala> spark.sql("CREATE TABLE openhouse.db.tb (ts timestamp, col1 string, col2 string) PARTITIONED BY (days(ts))").show() ++ || ++ ++

scala> spark.sql("INSERT INTO TABLE openhouse.db.tb VALUES (date_sub(CAST(current_timestamp() as DATE), 30), 'val1', 'val2')") res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SELECT * FROM openhouse.db.tb").show() +-------------------+----+----+ | ts|col1|col2| +-------------------+----+----+ |2024-04-02 00:00:00|val1|val2| +-------------------+----+----+

we can observe logs like:

INFO 9 --- [ main] c.l.o.c.s.h.HdfsStorageClient : Initializing storage client for type:..

sumedhsakdeo commented 4 months ago

LGTM. Looking forward to more information about cutover from old config to new config in PR description. It's not blocking though.

sumedhsakdeo commented 4 months ago

Also please check why Build is failing

linkedin / openhouse