vertica / spark-connector

This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20 stars 23 forks source link

Generate timed operations for a Spark Connector job #527

Closed ai-bq closed 1 year ago

ai-bq commented 1 year ago

We have the option of passing a parameter to our write job that times certain operations. For instance, if we look at the Spark-Connector examples in /spark-connector/examples/scala/src/main/scala/example/examples/BasicReadWriteExamples.scala we have a basic job that writes to Vertica then reads it.

This job falls under the writeThenRead function, and contains the following code to start the write:

  df.write.format(VERTICA_SOURCE)
        .options(options + ("table" -> tableName))
        .mode(mode)
        .save()

If we add timed_operations as a parameter along with the string "true," this will tell the connector it needs to time some operations.

      df.write.format(VERTICA_SOURCE)
        .options(options + ("table" -> tableName, "time_operations" -> "true"))
        .mode(mode)
        .save()

This should consequently produce the following output:

root@fcd239af6c6b:/spark-connector/examples/scala# ./submit-examples.sh writeThenRead
22/12/13 17:45:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/12/13 17:45:22 INFO SparkContext: Running Spark version 3.3.0
22/12/13 17:45:22 INFO ResourceUtils: ==============================================================
22/12/13 17:45:22 INFO ResourceUtils: No custom resources configured for spark.driver.
22/12/13 17:45:22 INFO ResourceUtils: ==============================================================
22/12/13 17:45:22 INFO SparkContext: Submitted application: Vertica-Spark Connector Scala Example
22/12/13 17:45:22 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/12/13 17:45:22 INFO ResourceProfile: Limiting resource is cpu
22/12/13 17:45:22 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/12/13 17:45:23 INFO SecurityManager: Changing view acls to: root
22/12/13 17:45:23 INFO SecurityManager: Changing modify acls to: root
22/12/13 17:45:23 INFO SecurityManager: Changing view acls groups to: 
22/12/13 17:45:23 INFO SecurityManager: Changing modify acls groups to: 
22/12/13 17:45:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/12/13 17:45:23 INFO Utils: Successfully started service 'sparkDriver' on port 39771.
22/12/13 17:45:23 INFO SparkEnv: Registering MapOutputTracker
22/12/13 17:45:23 INFO SparkEnv: Registering BlockManagerMaster
22/12/13 17:45:23 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/12/13 17:45:23 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/12/13 17:45:23 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/12/13 17:45:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-c4bacaea-273c-4cd8-a939-bc016d063770
22/12/13 17:45:23 INFO MemoryStore: MemoryStore started with capacity 1048.8 MiB
22/12/13 17:45:23 INFO SparkEnv: Registering OutputCommitCoordinator
22/12/13 17:45:23 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/12/13 17:45:23 INFO SparkContext: Added JAR file:/spark-connector/examples/scala/target/scala-2.12/vertica-spark-scala-examples.jar at spark://fcd239af6c6b:39771/jars/vertica-spark-scala-examples.jar with timestamp 1670953522913
22/12/13 17:45:23 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark:7077...
22/12/13 17:45:23 INFO TransportClientFactory: Successfully created connection to spark/172.19.0.6:7077 after 26 ms (0 ms spent in bootstraps)
22/12/13 17:45:23 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20221213174523-0000
22/12/13 17:45:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41747.
22/12/13 17:45:23 INFO NettyBlockTransferService: Server created on fcd239af6c6b:41747
22/12/13 17:45:23 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/12/13 17:45:24 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, fcd239af6c6b, 41747, None)
22/12/13 17:45:24 INFO BlockManagerMasterEndpoint: Registering block manager fcd239af6c6b:41747 with 1048.8 MiB RAM, BlockManagerId(driver, fcd239af6c6b, 41747, None)
22/12/13 17:45:24 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, fcd239af6c6b, 41747, None)
22/12/13 17:45:24 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, fcd239af6c6b, 41747, None)
22/12/13 17:45:24 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20221213174523-0000/0 on worker-20221213173342-172.19.0.2-34321 (172.19.0.2:34321) with 1 core(s)
22/12/13 17:45:24 INFO StandaloneSchedulerBackend: Granted executor ID app-20221213174523-0000/0 on hostPort 172.19.0.2:34321 with 1 core(s), 1024.0 MiB RAM
22/12/13 17:45:24 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20221213174523-0000/0 is now RUNNING
22/12/13 17:45:24 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
------------------------------------
-
- EXAMPLE: write data into Vertica then read it back 
-
------------------------------------
22/12/13 17:45:24 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
22/12/13 17:45:24 INFO SharedState: Warehouse path is 'file:/spark-connector/examples/scala/spark-warehouse'.
22/12/13 17:45:26 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.19.0.2:44022) with ID 0,  ResourceProfileId 0
22/12/13 17:45:26 INFO BlockManagerMasterEndpoint: Registering block manager 172.19.0.2:39883 with 434.4 MiB RAM, BlockManagerId(0, 172.19.0.2, 39883, None)
[col1: int]
22/12/13 17:45:27 INFO HadoopFileStoreLayer: Did not set AWS credentials provider for Hadoop config
22/12/13 17:45:27 INFO HadoopFileStoreLayer: Did not set AWS auth for Hadoop config
22/12/13 17:45:27 INFO HadoopFileStoreLayer: Did not set AWS session token for Hadoop config
22/12/13 17:45:27 INFO HadoopFileStoreLayer: Did not load Google Cloud Storage service account authentications
22/12/13 17:45:27 INFO VerticaJdbcLayer: Connecting to Vertica with URI: jdbc:vertica://vertica:5433/docker
22/12/13 17:45:27 INFO VerticaJdbcLayer: main: Successfully connected to Vertica.
22/12/13 17:45:28 INFO VerticaJdbcLayer: Connecting to Vertica with URI: jdbc:vertica://vertica:5433/docker
22/12/13 17:45:28 INFO VerticaJdbcLayer: main: Successfully connected to Vertica.
22/12/13 17:45:28 INFO HadoopFileStoreLayer: Did not set AWS credentials provider for Hadoop config
22/12/13 17:45:28 INFO HadoopFileStoreLayer: Did not set AWS auth for Hadoop config
22/12/13 17:45:28 INFO HadoopFileStoreLayer: Did not set AWS session token for Hadoop config
22/12/13 17:45:28 INFO HadoopFileStoreLayer: Did not load Google Cloud Storage service account authentications
22/12/13 17:45:28 INFO HadoopFileStoreLayer: Did not set AWS credentials provider for Hadoop config
22/12/13 17:45:28 INFO HadoopFileStoreLayer: Did not set AWS auth for Hadoop config
22/12/13 17:45:28 INFO HadoopFileStoreLayer: Did not set AWS session token for Hadoop config
22/12/13 17:45:28 INFO HadoopFileStoreLayer: Did not load Google Cloud Storage service account authentications
22/12/13 17:45:28 INFO VerticaDistributedFilesystemWritePipe: Writing data to Parquet file.
22/12/13 17:45:28 INFO TableUtils: BUILDING TABLE WITH COMMAND: Right(CREATE table "dftest" ("col1" INTEGER) INCLUDE SCHEMA PRIVILEGES )
22/12/13 17:45:31 INFO CodeGenerator: Code generated in 157.763919 ms
22/12/13 17:45:31 INFO OverwriteByExpressionExec: Start processing data source write support: com.vertica.spark.datasource.v2.VerticaBatchWrite@57fe6f2d. The input RDD has 1 partitions.
22/12/13 17:45:31 INFO SparkContext: Starting job: save at BasicReadWriteExamples.scala:80
22/12/13 17:45:31 INFO DAGScheduler: Got job 0 (save at BasicReadWriteExamples.scala:80) with 1 output partitions
22/12/13 17:45:31 INFO DAGScheduler: Final stage: ResultStage 0 (save at BasicReadWriteExamples.scala:80)
22/12/13 17:45:31 INFO DAGScheduler: Parents of final stage: List()
22/12/13 17:45:31 INFO DAGScheduler: Missing parents: List()
22/12/13 17:45:31 INFO DAGScheduler: Submitting ResultStage 0 (CoalescedRDD[3] at save at BasicReadWriteExamples.scala:80), which has no missing parents
22/12/13 17:45:31 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 20.6 KiB, free 1048.8 MiB)
22/12/13 17:45:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 10.2 KiB, free 1048.8 MiB)
22/12/13 17:45:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on fcd239af6c6b:41747 (size: 10.2 KiB, free: 1048.8 MiB)
22/12/13 17:45:31 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1513
22/12/13 17:45:31 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (CoalescedRDD[3] at save at BasicReadWriteExamples.scala:80) (first 15 tasks are for partitions Vector(0))
22/12/13 17:45:31 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
22/12/13 17:45:31 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (172.19.0.2, executor 0, partition 0, PROCESS_LOCAL, 5241 bytes) taskResourceAssignments Map()
22/12/13 17:45:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.19.0.2:39883 (size: 10.2 KiB, free: 434.4 MiB)
22/12/13 17:45:34 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3419 ms on 172.19.0.2 (executor 0) (1/1)
22/12/13 17:45:34 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
22/12/13 17:45:34 INFO DAGScheduler: ResultStage 0 (save at BasicReadWriteExamples.scala:80) finished in 3.631 s
22/12/13 17:45:34 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
22/12/13 17:45:34 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
22/12/13 17:45:34 INFO DAGScheduler: Job 0 finished: save at BasicReadWriteExamples.scala:80, took 3.664706 s
22/12/13 17:45:34 INFO OverwriteByExpressionExec: Data source write support com.vertica.spark.datasource.v2.VerticaBatchWrite@57fe6f2d is committing.
22/12/13 17:45:34 INFO VerticaJdbcLayer: Kerberos is not enabled in the hadoop config.
22/12/13 17:45:34 INFO VerticaJdbcLayer: Did not set AWSAuth
22/12/13 17:45:34 INFO VerticaJdbcLayer: Did not set AWSRegion
22/12/13 17:45:34 INFO VerticaJdbcLayer: Did not set AWSSessionToken
22/12/13 17:45:34 INFO VerticaJdbcLayer: Did not set AWSEndpoint
22/12/13 17:45:34 INFO VerticaJdbcLayer: Did not set AWSEnableHttps
22/12/13 17:45:34 INFO VerticaJdbcLayer: Did not set S3EnableVirtualAddressing
22/12/13 17:45:34 INFO VerticaJdbcLayer: Did not setup GCS authentications
22/12/13 17:45:34 INFO VerticaDistributedFilesystemWritePipe: Building default copy column list
22/12/13 17:45:34 INFO SchemaTools: Load by name. Column list: ("col1")
22/12/13 17:45:34 INFO VerticaDistributedFilesystemWritePipe: The copy statement is: 
COPY "dftest" ("col1") FROM 'webhdfs://hdfs:50070/data/bb2e6fe9_c72a_4c10_af81_7d7a00fbadad/*.parquet' ON ANY NODE parquet REJECTED DATA AS TABLE "dftest_bb2e6fe9_c72a_4c10_af81_7d7a00fbadad_COMMITS" NO COMMIT
22/12/13 17:45:35 INFO VerticaDistributedFilesystemWritePipe: Performing copy from file store to Vertica
22/12/13 17:45:35 INFO VerticaDistributedFilesystemWritePipe: Checking number of rejected rows via statement: SELECT COUNT(*) as count FROM "dftest_bb2e6fe9_c72a_4c10_af81_7d7a00fbadad_COMMITS"
22/12/13 17:45:35 INFO VerticaDistributedFilesystemWritePipe: Verifying rows saved to Vertica is within user tolerance...
22/12/13 17:45:35 INFO VerticaDistributedFilesystemWritePipe: Number of rows_rejected=0. rows_copied=20. failedRowsPercent=0.0. user's failed_rows_percent_tolerance=0.0. passedFaultToleranceTest=true...PASSED.  OK to commit to database.
22/12/13 17:45:35 INFO VerticaDistributedFilesystemWritePipe: Dropping Vertica rejects table now: DROP TABLE IF EXISTS "dftest_bb2e6fe9_c72a_4c10_af81_7d7a00fbadad_COMMITS" CASCADE
22/12/13 17:45:35 INFO VerticaDistributedFilesystemWritePipe: Committing data into Vertica.
22/12/13 17:45:35 INFO VerticaDistributedFilesystemWritePipe: Timed operation: Copy and commit data into Vertica -- took 326 ms.
22/12/13 17:45:35 INFO OverwriteByExpressionExec: Data source write support com.vertica.spark.datasource.v2.VerticaBatchWrite@57fe6f2d committed.
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS credentials provider for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS auth for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS session token for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not load Google Cloud Storage service account authentications
22/12/13 17:45:35 INFO VerticaJdbcLayer: Connecting to Vertica with URI: jdbc:vertica://vertica:5433/docker
22/12/13 17:45:35 INFO VerticaJdbcLayer: main: Successfully connected to Vertica.
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS credentials provider for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS auth for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS session token for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not load Google Cloud Storage service account authentications
22/12/13 17:45:35 INFO VerticaScanBuilder: Vertica 12.0.1-0 does not support writing the following complex types columns: . Export will be written to JSON instead.
22/12/13 17:45:35 INFO VerticaScanBuilder: Vertica 12.0.1-0 does not support writing the following complex types columns: . Export will be written to JSON instead.
22/12/13 17:45:35 INFO V2ScanRelationPushDown: 
Output: col1#4L

22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS credentials provider for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS auth for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS session token for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not load Google Cloud Storage service account authentications
22/12/13 17:45:35 INFO VerticaJdbcLayer: Kerberos is not enabled in the hadoop config.
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSAuth
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSRegion
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSSessionToken
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSEndpoint
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSEnableHttps
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set S3EnableVirtualAddressing
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not setup GCS authentications
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Creating unique directory: webhdfs://hdfs:50070/data/d4791632_3c9a_45bd_87ff_14f8841c1ea2 with permissions: 700
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Select clause requested: "col1"
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Pushdown filters: 
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Export Source: "dftest"
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Exporting using statement: 
EXPORT TO PARQUET(directory = 'webhdfs://hdfs:50070/data/d4791632_3c9a_45bd_87ff_14f8841c1ea2/dftest', fileSizeMB = 4096, rowGroupSizeMB = 16, fileMode = '700', dirMode = '700') AS SELECT "col1" FROM "dftest";
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS credentials provider for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS auth for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS session token for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not load Google Cloud Storage service account authentications
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Timed operation: Export To Parquet From Vertica -- took 122 ms.
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Requested partition count: 1
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Parquet file list size: 1
22/12/13 17:45:35 INFO BlockManagerInfo: Removed broadcast_0_piece0 on fcd239af6c6b:41747 in memory (size: 10.2 KiB, free: 1048.8 MiB)
22/12/13 17:45:35 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 172.19.0.2:39883 in memory (size: 10.2 KiB, free: 434.4 MiB)
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Total row groups: 1
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Creating partitions.
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Timed operation: Reading Parquet Files Metadata and creating partitions -- took 343 ms.
22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Reading data from Parquet file.
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS credentials provider for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS auth for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not set AWS session token for Hadoop config
22/12/13 17:45:35 INFO HadoopFileStoreLayer: Did not load Google Cloud Storage service account authentications
22/12/13 17:45:35 INFO VerticaJdbcLayer: Connecting to Vertica with URI: jdbc:vertica://vertica:5433/docker
22/12/13 17:45:35 INFO VerticaJdbcLayer: main: Successfully connected to Vertica.
22/12/13 17:45:35 INFO VerticaJdbcLayer: Kerberos is not enabled in the hadoop config.
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSAuth
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSRegion
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSSessionToken
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSEndpoint
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set AWSEnableHttps
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not set S3EnableVirtualAddressing
22/12/13 17:45:35 INFO VerticaJdbcLayer: Did not setup GCS authentications
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Creating unique directory: webhdfs://hdfs:50070/data/d4791632_3c9a_45bd_87ff_14f8841c1ea2 with permissions: 700
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Directory already existed: webhdfs://hdfs:50070/data/d4791632_3c9a_45bd_87ff_14f8841c1ea2
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Select clause requested: "col1"
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Pushdown filters: 
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Export Source: "dftest"
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Export already done, skipping export step.
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Requested partition count: 1
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Parquet file list size: 1
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Total row groups: 1
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Creating partitions.
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Timed operation: Reading Parquet Files Metadata and creating partitions -- took 26 ms.
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Reading data from Parquet file.
22/12/13 17:45:36 INFO CodeGenerator: Code generated in 14.124034 ms
22/12/13 17:45:36 INFO SparkContext: Starting job: show at BasicReadWriteExamples.scala:88
22/12/13 17:45:36 INFO DAGScheduler: Got job 1 (show at BasicReadWriteExamples.scala:88) with 1 output partitions
22/12/13 17:45:36 INFO DAGScheduler: Final stage: ResultStage 1 (show at BasicReadWriteExamples.scala:88)
22/12/13 17:45:36 INFO DAGScheduler: Parents of final stage: List()
22/12/13 17:45:36 INFO DAGScheduler: Missing parents: List()
22/12/13 17:45:36 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[7] at show at BasicReadWriteExamples.scala:88), which has no missing parents
22/12/13 17:45:36 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 11.3 KiB, free 1048.8 MiB)
22/12/13 17:45:36 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.7 KiB, free 1048.8 MiB)
22/12/13 17:45:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on fcd239af6c6b:41747 (size: 5.7 KiB, free: 1048.8 MiB)
22/12/13 17:45:36 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1513
22/12/13 17:45:36 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[7] at show at BasicReadWriteExamples.scala:88) (first 15 tasks are for partitions Vector(0))
22/12/13 17:45:36 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 0
22/12/13 17:45:36 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (172.19.0.2, executor 0, partition 0, PROCESS_LOCAL, 5073 bytes) taskResourceAssignments Map()
22/12/13 17:45:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.19.0.2:39883 (size: 5.7 KiB, free: 434.4 MiB)
22/12/13 17:45:36 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 371 ms on 172.19.0.2 (executor 0) (1/1)
22/12/13 17:45:36 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
22/12/13 17:45:36 INFO DAGScheduler: ResultStage 1 (show at BasicReadWriteExamples.scala:88) finished in 0.386 s
22/12/13 17:45:36 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
22/12/13 17:45:36 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
22/12/13 17:45:36 INFO DAGScheduler: Job 1 finished: show at BasicReadWriteExamples.scala:88, took 0.391499 s
22/12/13 17:45:36 INFO CodeGenerator: Code generated in 11.792328 ms
+----+
|col1|
+----+
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
|  77|
+----+

22/12/13 17:45:36 INFO ApplicationParquetCleaner: Removed webhdfs://hdfs:50070/data/d4791632_3c9a_45bd_87ff_14f8841c1ea2
22/12/13 17:45:36 INFO SparkUI: Stopped Spark web UI at http://fcd239af6c6b:4040
22/12/13 17:45:36 INFO StandaloneSchedulerBackend: Shutting down all executors
22/12/13 17:45:36 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
22/12/13 17:45:36 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/12/13 17:45:36 INFO MemoryStore: MemoryStore cleared
22/12/13 17:45:36 INFO BlockManager: BlockManager stopped
22/12/13 17:45:36 INFO BlockManagerMaster: BlockManagerMaster stopped
22/12/13 17:45:36 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/12/13 17:45:36 INFO SparkContext: Successfully stopped SparkContext
------------------------------------
-
- EXAMPLE: Data written to Vertica 
-
------------------------------------
22/12/13 17:45:36 INFO SparkContext: SparkContext already stopped.
22/12/13 17:45:36 INFO ShutdownHookManager: Shutdown hook called
22/12/13 17:45:36 INFO ShutdownHookManager: Deleting directory /tmp/spark-2c2ac0eb-938d-48ed-8d44-b3567164fb39
22/12/13 17:45:36 INFO ShutdownHookManager: Deleting directory /tmp/spark-17ebde17-89bf-447f-a30a-be750c8b9d52

We should have the following checkpoints timed:

22/12/13 17:45:35 INFO VerticaDistributedFilesystemWritePipe: Timed operation: Copy and commit data into Vertica -- took 326 ms.

Where the Connector is writing data into Vertica through the pipe.

22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Timed operation: Export To Parquet From Vertica -- took 122 ms.

Where the Connector is exporting data from Vertica into our intermediary storage in Parquet format.

22/12/13 17:45:35 INFO VerticaDistributedFilesystemReadPipe: Timed operation: Reading Parquet Files Metadata and creating partitions -- took 343 ms.
22/12/13 17:45:36 INFO VerticaDistributedFilesystemReadPipe: Timed operation: Reading Parquet Files Metadata and creating partitions -- took 26 ms.

The last two are similar but the former is reading the metadata while the latter performs the read.