wri / gfw_forest_loss_geotrellis

Global Tree Cover Loss Analysis using Geotrellis and SPARK
MIT License
10 stars 8 forks source link

Refactor Analysis Commands to minimize usage of kwargs #113

Closed echeipesh closed 3 years ago

echeipesh commented 3 years ago

Pull request type

Please check the type of change your PR introduces: - [ ] Bugfix - [ ] Feature - [ ] Code style update (formatting, renaming) - [x] Refactoring (no functional changes, no api changes) - [ ] Build related changes - [ ] Documentation content changes - [ ] Other (please describe): ## What is the current behavior? Using ForestChangeDiagnosticCommand as an exmaple: ```scala val forestChangeDiagnosticCommand: Opts[Unit] = Opts.subcommand( name = "forest_change_diagnostic", help = "Compute summary statistics for GFW Pro Forest Change Diagnostic." ) { ( defaultOptions, intermediateListSourceOpt, fireAlertOptions, defaultFilterOptions, featureFilterOptions, ).mapN { (default, intermediateListSource, fireAlert, defaultFilter, featureFilter) => val kwargs = Map( "featureUris" -> default._2, "outputUrl" -> default._3, "splitFeatures" -> true, // force to split features "intermediateListSource" -> intermediateListSource, "fireAlertType" -> fireAlert._1, "fireAlertSource" -> fireAlert._2, "idStart" -> featureFilter._1, "idEnd" -> featureFilter._2, "limit" -> defaultFilter._1, "tcl" -> defaultFilter._2, "glad" -> defaultFilter._3 ) runAnalysis("forest_change_diagnostic", default._1, default._2, kwargs) } ``` ## What is the new behavior? ```scala val forestChangeDiagnosticCommand: Opts[Unit] = Opts.subcommand( name = ForestChangeDiagnosticAnalysis.name, help = "Compute summary statistics for GFW Pro Forest Change Diagnostic." ) { ( defaultOptions, intermediateListSourceOpt, fireAlertOptions, featureFilterOptions ).mapN { (default, intermediateListSource, fireAlert, filterOptions) => val kwargs = Map( "outputUrl" -> default.outputUrl, "noOutputPathSuffix" -> default.noOutputPathSuffix ) val featureFilter = FeatureFilter.fromOptions(default.featureType, filterOptions) runAnalysis { spark => val featureRDD = FeatureRDD(default.featureUris, default.featureType, featureFilter, default.splitFeatures, spark) val fireAlertRDD = FireAlertRDD(spark, fireAlert.alertType, fireAlert.alertSource, FeatureFilter.empty) ForestChangeDiagnosticAnalysis(featureRDD, default.featureType, intermediateListSource, fireAlertRDD, spark, kwargs) } } ``` This change involves several refactors: - All command line options are parsed into case classes instead of tuples, minimizing mistakes from using `._1` instead of `._2` - All filter related options are aggregated together into `featureFilterOptions` which are used directly instead of `kwargs` - FeatureRDD and FireRDD are brought to top level to minimize use of `kwargs` - `runAnalysis` invokes the analysis directly without going through the `SummaryAnalysisFactory` Discussion on intention of the refactor: ### Minimize use of `kwargs` for filters This has two goals, increase readability, navigability of the code. Ideally one able to trace usage of each parameter to its source using code navigation "Go To Definition". This is easier to read but also allows the compiler to enforce type checking and support future refactors. The main usage of kwargs here was for selecting filters on feature inputs. Potentially all filters are available to the user, but only some of them are applicable for specific feature type. This logic is no captured in the interaction of `FeatureFilter.fromOptions` and `AllFilterOptions` (which aggregates all possible command line argument input). Additional upshot is that this logic is now applied uniformly to all commands without having to add all supported filters to kwargs. The instantiated filters are passed through as instance of `FeatureFilter` class that matched the `featureType` value. ### Remove Analysis Factory The analysis factory has two major drawbacks in this code base: 1. Additional and unnecessary indirection, each command has to specify the analysis name anyway. 2. Unnecessary interface restriction. It forces uniformity for the Analysis class that is not needed and requires use of `kwargs` to get around The preferred situation is that each command can use the parsed arguments to instantiate the corresponding analysis as needed. ## Does this introduce a breaking change? - [ ] Yes - [x] No ## Other notes There are additional clean-ups that are implied here, but this PR should allow future refactors to happen per analysis command.
codecov[bot] commented 3 years ago

Codecov Report

Merging #113 (f681de5) into develop (2f396c4) will increase coverage by 0.10%. The diff coverage is 0.00%.

:exclamation: Current head f681de5 differs from pull request most recent head 73a763d. Consider uploading reports for the commit 73a763d to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           develop    #113      +/-   ##
==========================================
+ Coverage     3.51%   3.62%   +0.10%     
==========================================
  Files          258     256       -2     
  Lines         5257    5110     -147     
==========================================
  Hits           185     185              
+ Misses        5072    4925     -147     
Flag Coverage Δ
unittests 3.62% <0.00%> (+0.10%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...scala/org/globalforestwatch/features/Feature.scala 0.00% <0.00%> (ø)
...ala/org/globalforestwatch/features/FeatureDF.scala 0.00% <ø> (ø)
...org/globalforestwatch/features/FeatureFilter.scala 0.00% <0.00%> (ø)
...ala/org/globalforestwatch/features/FeatureId.scala 0.00% <ø> (ø)
...la/org/globalforestwatch/features/FeatureRDD.scala 0.00% <0.00%> (ø)
.../org/globalforestwatch/features/FireAlertRDD.scala 0.00% <0.00%> (ø)
...a/org/globalforestwatch/features/GadmFeature.scala 0.00% <0.00%> (ø)
...org/globalforestwatch/features/GfwProFeature.scala 0.00% <0.00%> (ø)
...alforestwatch/features/PolygonIntersectionDF.scala 0.00% <0.00%> (ø)
...org/globalforestwatch/features/SimpleFeature.scala 0.00% <0.00%> (ø)
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2f396c4...73a763d. Read the comment docs.