Closed ahoy-jon closed 5 years ago
The datasets and dataframe are now cached. However, we have warning know.
We need to develop a procedure :
def cacheIfNotCached(dataset:Dataset[_]):Unit
to clear the warnings :
19/05/26 17:54:06 WARN CacheManager: Asked to cache already cached data.
19/05/26 17:54:07 WARN CacheManager: Asked to cache already cached data.
19/05/26 17:54:07 WARN CacheManager: Asked to cache already cached data.
Feature is integrated
Source : https://slides.com/nastasiasaby/spark-conseils#/35 ( @NastasiaSaby )
Spark-Tests can be improved if we limit the number of actions.
We can automatically cache if possible dataset/dataframes/rdds to speed-up the tests so
is not doing extra computations.