teragrep / pth_10

Data Processing Language (DPL) translator for Apache Spark
GNU Affero General Public License v3.0
0 stars 6 forks source link

Refactor TeragrepHdfsSaveStep #375

Open 51-code opened 2 weeks ago

51-code commented 2 weeks ago

Description

TeragrepHdfsSaveStep has multiple responsibilities now as it handles all different file formats. Should be refactored to multiple step objects, perhaps composable decorators can be used. Currently the class is difficult to read and difficult to add new functionalities to.

51-code commented 1 week ago

There is also an abstract class TeragrepHdfsStep which is extended by TeragrepHdfsSaveStep. It includes functions to serialize and deserialize metadata in hdfs files. This is to be refactored to its own object too. Not only are abstract classes quite bad already but the functions right now are more or less utility functions.