tensorflow / ecosystem

Integration of TensorFlow with other open-source frameworks
Apache License 2.0
1.37k stars 392 forks source link

Add udf DataFrameTfrConverter #132

Closed fhoering closed 4 years ago

fhoering commented 5 years ago

Sidenotes:

jhseu commented 5 years ago

@skavulya What are your thoughts on this change?

skavulya commented 5 years ago

@jhseu I'll review the PR this evening. @fhoering Thanks for the contribution.

fhoering commented 5 years ago

@skavulya Thanks for the review. I will do the changes next week. I was also experimenting doing the inference in python with vectorized pandas udf which requires spark 2.4 and works fine. So I think I will change the provided example in the readme to the vectorized inference version.

fhoering commented 5 years ago

@skavulya @jhseu I did the changes. I also provided en example for using vectorized udf for prediction. It only works with spark >= 2.4 which gets a bit ugly as the project still references 2.3. But I tested. It works. We can maybe also update the project ? If you think the inference example is too complicated we can also remove it as it is not directly linked to tf record conversion but it is how I am using this.

fhoering commented 5 years ago

@skavulya @jhseu If you don't have much time discard my last comment. I've done the changes asked by @skavulya in the units tests. The scala code is compiling & working.

The python code in README is only for information and as template anyway. No need for it to be perfect or handle all cases.

So let's merge this as is.

fhoering commented 4 years ago

@jhseu Any intention to merge this ? Otherwise let's just close this PR

fhoering commented 4 years ago

Thanks