Open nicodv opened 3 years ago
Thanks for the contribution! Before we can merge this, we need @wsuchy @koertkuipers to sign the Salesforce.com Contributor License Agreement.
@leahmcguire I think it is just not being used - https://github.com/salesforce/TransmogrifAI/pull/550#discussion_r597091426
We should be careful in how we define unused in a public project. Also that functionality would be needed to migrate projects on Transmogrifai V0...
Hey @tovbinm,there's a unit test failure I've been investigating that's the result of a bug in Spark: https://issues.apache.org/jira/browse/SPARK-34805?focusedCommentId=17337491&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17337491.
I'm wondering why testData
in SanityCheckerTest.scala
(L137) is constructed the way it is, with the metadata for the features column added manually. The fact that the metadata isn't passed along in DataFrame.select
anymore is discovered by this assertion.
I'm assuming Spark won't fix this any time soon, and I'm having trouble finding an alternative way of putting in the metadata in the schema of testData
. I've tried .withColumn
, but it still relies on .select
under the hood. What's your take on this?
Also pinging @Jauntbox (we know you're out there!) for question above.
This is a known issue indeed. We have been copying over the metadata between fields each time we apply our transformers, e.g OpTransformer1.transform
This is a known issue indeed. We have been copying over the metadata between fields each time we apply our transformers, e.g
OpTransformer1.transform
I mean that there is a new problem with Spark 3.1. Even OpTransformer1.transform
is broken now since it relies on .select
to pass back the metadata into the output dataframe. SelectedModelCombinerTest
tests the .transform
function directly and also fails for the same reason.
StructField
still has the metadata in it, it's just ExpressionEncoder
in Spark 3.x does not allow passing it anymore. Oh, it's a true bummer. We rely heavily on this feature.
Hello, any estimation on when we can get this PR ready? Thank you!
@hedibejaoui , we are running internal forks of TransmogrifAI and MLeap on Spark 3.1.1, so the bulk of the work has been done.
For public release, the MLeap dependency needs to be upgraded now that they're on Spark 3 too: https://github.com/combust/mleap/pull/765
But since they've upgraded to Spark 3.0.2 and TransmogrifAI to 3.1.1, we have some testing left to do.
@nicodv Thanks for the information. Actually, we are using Spark 3.0.x because of some internal dependencies, any chance we get a public release of TransmogrifAI for that version?
Hello, When do you think this PR will merged for the public use? Thank you!
Hi, This PR adds important functionality that I need for my project. When will this PR merge ?
Thank you
Hi, we are waiting for the new PR adds. When it will be available ? Thanks
Related issues https://github.com/salesforce/TransmogrifAI/issues/336 https://github.com/salesforce/TransmogrifAI/issues/332
Describe the proposed solution Upgrade to Scala 2.12 and Spark 3
Describe alternatives you've considered Living in the past, suffering from security issues and missing out on feature and speed improvements
Additional context Add any other context about the changes here.