DATAVIEW
DATAVIEW (www.dataview.org) is a big data workflow management system. It uses Dropbox as the data cloud and Amazon EC2 as the compute cloud. It also provides a workflow_LocalExecutor for users to run their local machine off the cloud. Current research focuses on the 1) performance and cost optimization for running workflows in clouds and 2) infrastructual-level support on GPU-enabled deep learning workflows. For deep learning workflows, it currently supports GPU infrastructures including 1) the Local NVIDIA GPU of a PC, 2) GPU Xavier and Nano SoMs (System-on-Module) and 3) the Heterogeneous GPU Cluster.
DATAVIEW supports two programing interfaces to develop and run workflows:
-
JAVA API: A programmer can develop various workflow tasks and workflows based on the DATAVIEW libraries. /DATAVIEW/src/test.java shows the six steps to create a customized workflow and execute it in Amazon EC2 or Local PC environment.
- The external dependecies libraries must be added to the Eclipse project from /DATAVIEW/WebContent/WEB-INF/lib
- To utilize the Amazon EC2, the accessKey and secretKey should be updated in config.properties under /DATAVIEW/WebContent/workflowLibDir/
- After finishing the workflow, please terminate all the EC2 instances from your AWS account manually (in the case of running worklfow in Amazon EC2).
-
Visual Programming: DATAVIEW is deployed as a Web site in Tomcat and a user can drag and drop tasks and link them into a workflow in a visual workflow design and execution environment called Webbench.
- A dropbox accout is necessary to store all the input data, workflow tasks, the final output files produced by the workflow execution. The user needs to create Three default folders
Dropbox/DATAVIEW/Tasks, which stores the task file (class file or jar file); Dropbox/DATAVIEW/Workflows, which stores the mxgraph file for the generated workflow; Dropbox/DATAVIEW-INPUT, which stores the input files for a workflow.
Four relational algebra tasks (jar files) and input files are already stored under the DATAVIEW/WebContent/workflowTaskDir folder.
- A local account needs to be registered to show a visualized workflow.
- A dropbox token should be provided in the main interface when you login in, which can be generated based on this tutorial:https://blogs.dropbox.com/developers/2014/05/generate-an-access-token-for-your-own-account/
Download and configure DATAVIEW as JAVA API
Check out tutorial: https://youtu.be/xJikeWptYSw or follow the instructions below:
- Download the DATAVIEW package from https://github.com/shiyonglu/DATAVIEW by clicking the "Clone or Download" button.
- Unzip the DATAVIEW-master.zip file and import the DATAVIEW project into Eclipse as an "Existing Projects into Workspace" by selecting "Projects from Folder or Archive".
- The external dependecies libraries must be added to the Eclipse project from /DATAVIEW/WebContent/WEB-INF/lib
- /DATAVIEW/src/test.java shows the six steps to create a new workflow and execute it with local executor.
Download, configure, and deploy DATAVIEW as a Website
Check out tutorial: https://youtu.be/7Sz4PSD_6Cs or follow the instructions below:
- Follow the first three steps from Download and configure DATAVIEW as JAVA API
- Create three default folders Dropbox/DATAVIEW/Tasks, which stores the task file (class file or jar file); Dropbox/DATAVIEW/Workflows, which stores the mxgraph file for the generated workflow; Dropbox/DATAVIEW-INPUT, which stores the input files for a workflow in your dropbox.
- Get a dropbox token.
Run Deep Learning workflow (NNWorkflow) in DATAVIEW on Local NVIDIA GPU
Check out The introduction of DlaaW (Deep-learning-as-a-workflow) in DATAVIEW : https://www.youtube.com/watch?v=3KDq5CTcrGE.
Below are some extra tips aside from instructions in Download, configure, and deploy DATAVIEW as a Website and Download, configure, and deploy DATAVIEW as a Website:
-
JAVA API: A programmer can utilize various workflow NNasks and NNWorkflows based on the DATAVIEW libraries. /DATAVIEW/src/NNTest.java shows the 4 steps to create a customized NNWorkflow and execute it in one of NNTrainers (each corresponding to one specific execution plan and GPU infrastructure).
- There is no need to install extra libraries or driver (e.g. CUDA toolkit) as long as you have a local NVIDIA GPU on your PC.
- In order to run NNWorkflow Java API version, need tomcat version lower than or equal to tomcat 9 (Our recommendation is tomcat 9).
-
Visual Programming: DATAVIEW is deployed as a Web site in Tomcat and a user can drag and drop tasks and link them into a NNWorkflow in a visual workflow design and execution environment called Webbench.
- In order to run NNWorkflow Website version on your Local PC, need java jdk version less than or equal to 15 (Our recommendation is JAVA JDK 15).
- To run NNWorkflow in web GUIs, you should copy following files from your local DATAVIEW TrainerDLLs and ExecutorDLLs folders from /DATAVIEW/WebContent/workflowTaskDir repository to the DATAVIEW-INPUT folder in your dropbox, files including jsoncpp.dll, maintest.dll, nnExecutor.dll
DATAVIEW Tutorials
- Chapter 1: A gentle introduction to DATAVIEW https://youtu.be/7S4iGKXpaAc)
- How to download, import DATAVIEW into Eclipse as Java API and run a workflow with local executor (https://youtu.be/xJikeWptYSw)
- How to create a relational algebra workflow in DATAVIEW through the interface (https://youtu.be/AQw0S_QO8zg)