salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Simplified selectExistingPaths #486

Closed sakhuja closed 4 years ago

sakhuja commented 4 years ago
  1. selectExistingPaths method simplification.
  2. handling for order free path validity in the path argument.

Related issues [Issues]https://github.com/salesforce/TransmogrifAI/issues/483

Describe the proposed solution In selectExistingPaths method, we check for the component paths validity using FileSystem object's read outcome. Also, make use of scala's Try method to bucketize valid and invalid paths. as part of this change we also make sure that the order of the component paths is not critical to the success of the selectExistingPaths method, specially when the first path component is invalid. Added a test as well to verify that we can handle order independent paths.

salesforce-cla[bot] commented 4 years ago

Thanks for the contribution! Unfortunately we can't verify the commit author(s): Aditya a***@s***.com. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

salesforce-cla[bot] commented 4 years ago

Thanks for the contribution! It looks like @sakhuja is an internal user so signing the CLA is not required. However, we need to confirm this.

codecov[bot] commented 4 years ago

Codecov Report

Merging #486 into master will decrease coverage by 8.25%. The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #486      +/-   ##
==========================================
- Coverage   87.02%   78.76%   -8.26%     
==========================================
  Files         345      345              
  Lines       11683    11683              
  Branches      387      378       -9     
==========================================
- Hits        10167     9202     -965     
- Misses       1516     2481     +965     
Impacted Files Coverage Δ
...la/com/salesforce/op/utils/io/avro/AvroInOut.scala 35.13% <0.00%> (-64.87%) :arrow_down:
...scala/com/salesforce/op/utils/text/TextUtils.scala 0.00% <0.00%> (-100.00%) :arrow_down:
.../scala/com/salesforce/op/test/FeatureAsserts.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...ala/com/salesforce/op/readers/CSVAutoReaders.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...la/com/salesforce/op/test/TestFeatureBuilder.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...om/salesforce/op/stages/impl/feature/OpNGram.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...alesforce/op/stages/impl/feature/OpHashingTF.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...lesforce/op/stages/impl/feature/LangDetector.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...sforce/op/aggregators/CustomMonoidAggregator.scala 0.00% <0.00%> (-100.00%) :arrow_down:
...sforce/op/stages/base/binary/BinaryEstimator.scala 0.00% <0.00%> (-100.00%) :arrow_down:
... and 76 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5a2449e...c29bf82. Read the comment docs.

sakhuja commented 4 years ago

@gerashegalov This change is ready for review.

please note: I've added my salesforce email id to my account to confirm with salesforce-cla requirement.

sakhuja commented 4 years ago

Thanks for the review @gerashegalov. Could you help merge? I don't have write-access.