linkedin / transport

A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
BSD 2-Clause "Simplified" License
297 stars 73 forks source link

Refactor getRequiredFiles to not add files in executor #116

Closed rzhang10 closed 2 years ago

rzhang10 commented 2 years ago

The previous attempt #115 that tried to fix the issue seems not working, as it still throws NPE as mentioned in https://github.com/linkedin/transport/pull/115#discussion_r844370883.

This patch uses another approach: by refactoring the getRequiredFiles so that it initialize _distributedCacheFiles in both driver and executor, but only add those files when on driver.

rzhang10 commented 2 years ago

Thanks @rzhang10 ! Are you able to test if this approach works?

Yes it's tested and it worked.