linkedin / openhouse

Open Control Plane for Tables in Data Lakehouse
https://www.openhousedb.org/
BSD 2-Clause "Simplified" License
273 stars 43 forks source link

Ensure putSnapshot path honoring case insensitive contract #85

Closed autumnust closed 2 months ago

autumnust commented 2 months ago

Summary

This is a bug fixes for put-snapshot code path, where UUID-extraction process is using tableId and databaseId from the request itself. Those ids, if directly obtained from top-level request body, can lost casing if the requests were from platform like Spark SQL. Since the underlying storage (e.g. HDFS ) is case sensitive in its path URL, we will need to ensure the original casing info when issue the first commit as part of CTAS is preserved in the process of UUID-extraction of the second commit.

The other parts in this PR is to ensure, when put is happening, the tableDto is not always built from scratch when there's existing object discovered by findById method previously. This is done by switch orElse method to orElseGet, in which the latter will only call the supplier lazily when the calling object is absent. This leads to a wasteful implementation as well as confusion on the code stack.

This PR also include:

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

For all the boxes checked, include additional details of the changes made in this pull request.

autumnust commented 2 months ago

Why is the issue related to CTAS? It failed in the regular write path.

The ticket you gave has the log line which located in this class.