shiyonglu / DATAVIEW

DATAVIEW is a big data workflow management system. It uses Dropbox as the data cloud and Amazon EC2 as the compute cloud. Current research focuses on the security and privacy aspects of DATAVIEW as well as performance and cost optimization for running workflows in clouds.
11 stars 5 forks source link

Paper submission: IEEE Trans. on Cloud Computing #8

Open ishtiaqcs opened 5 years ago

ishtiaqcs commented 5 years ago

Associate Editor Comments to the Author: This article aims to address workflow scheduling and securing workflow execution. The authors present a workflow scheduling algorithm called SGX-E2C2D, and the execution framework using distributed Virtual Machines in the public cloud. Experimental results are reported.

The reviewers consider the topics important, and consider the proposed methods feasible.

However, the reviewers have concerns of several aspects though. First is the novelty of the proposed methods. The reviewers believe the authors should rigorously compare the proposed methods with the state-of-the-art approaches. Second is the comprehensiveness of the experimental study. Third is the readability and presentation of the paper.

Therefore, the reviewer team suggests a Resubmission as New. No further reviews will be conducted. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

We hope that you will find the comments from the reviewers to be useful in your future work. If you have any questions, feel free to contact me.

Sincerely,

James Joshi Transactions on Services Computing, Editor-in-Chief IEEE Computer Society jjoshi@pitt.edu

=======================================

Reviewers' Comments

Please note that some reviewers may have included additional comments in a separate file. If a review contains the note "see the attached file" under Section III A - Public Comments, you will need to log on to ScholarOne Manuscripts to view the file. After logging in to ScholarOne Manuscripts, enter the Author Center. Then, click on Submitted Manuscripts and find the correct paper and click on "View Letter". Scroll down to the bottom of the decision letter and click on the file attachment link. This will pop-up the file that the reviewer included for you along with their review.

Reviewer: 1

Recommendation: Reject

Comments: The major technical contributions of this paper include workflow scheduling and securing workflow execution. Both of them are quite marginal. For workflow scheduling, it mainly uses the dependencies among tasks and their constraints (such as earliest start time and critical path) and form the schedule by following those information. It is rather straightforward and there is no ensure of optimization. Many existing research efforts targeting the similar optimization issue use linear programming or integer programming, which are more advanced than the solution proposed in this paper. The paper does not compare itself with those work theoretically or experimentally.

The security part of the paper is just a simple usage of an existing package, which makes it lack novelty. Moreover, SGX only allows the encryption/decryption of processed data but not the execution flow, which also reflects the design of the algorithm and should be protected (as claimed in the abstract).

The experimental study is not comprehensive enough. It tested a sorting algorithm which is not a typical example of big data workflows (mentioned in the title). It only checks the performance difference with IC-PCP, but not those recent efforts.

Additional Questions:

  1. Which category describes this manuscript?: Research

How relevant is this manuscript to the readers? Explain under Public Comments: Relevant

  1. Please explain how this manuscript advances this field of research and/or contributes something new to the literature.: This paper proposes an approach to achieve secured execution of workflows and optimize the scheduling through minimizing monetary cost. The experimental study was based on simulating workflows and comparing the performance with IC-PCP.

  2. Is the manuscript technically sound? Please explain your answer under Public Comments below.: Partially

  3. Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: No

  4. Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: Important references are missing; more references are needed

  5. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Yes

  6. How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below.: Satisfactory

  7. Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Easy to read

  8. Should the supplemental material be included? (Click on the Supplementary Files icon to view files): Does not apply, no supplementary files included

  9. If yes to 6, should it be accepted:

Please rate the manuscript. Explain your choice.: Fair

Reviewer: 2

Recommendation: Revise and Resubmit as "New"

Comments: The paper presents the workflow scheduling algorithm called SGX-E2C2D, and the execution framework using distributed Virtual Machines in the public cloud. It considers the critical path and directed graph topology ordering to identify parts of paths ( a set of tasks) to be executed in a virtual machine to meet the deadlines and the minimization of overall costs. In case of a task to be confidential the critical paths in the topology is split to different VM's and the confidential task is handled by a separate VM that uses the encrypted data and enclaved libraries to ensure the confidentiality. The deadline constraint saves the overall execution time, by using the execution time space.

This approach is tested with synthetic workflows to show that the deadline constraint is indeed saved the overall time and costs compared to the baseline algorithm that does not consider the critical path for task scheduling in prioritizing the cost and time sensitive manner.

The experiments also shows the exeuction time and costs slowly increase as there are more confidential tasks in a workflow, because of the encryption/decryption and additional task split for the subgraph containing the confidential task.

Although the approach may be interesting and feasible, the paper needs a lot of improvement to the comprehensibility. The following suggestions may be helpful for improving the presentation and comprehensibility:

  1. The presentation of the content MUST be improved. The readability is very poor without any concrete real life examples to motivate the issue of confidentiality and integrity. The abstract workflow in Fig 1 does not convince the reader whether this is indeed a feasible workflow.

  2. The represented graph in the directed graph of the workflow needs more rigorous definitions. The edge labels are not explained. The only that counts here is the execution start and end time, and the virtual machine costs. The real life workflow examples should be presented to convince the issues of execution by virtual machines, and the confidentiality of some tasks in the workflow. The motivating example as a problem statement should be presented in Introduction first before introducing the formal approach and algorithms. In other words, show that this problem is real, not a concocted for the sake of solving a problem.

  3. In section 3.1, the formal definition of workflow can improve. The workflow usually have some preconditions among tasks, and their join/split relations. These are not modeled. Where is the EST, LFT specified?

  4. Similarly, the resouce model could be modeled with costs and speed in mind.
    Formal models of both are sketchy.

  5. The Critical path in 3.3 is the common concept in the project management

  6. Section 4.1-4.4 are unreadable. It is better to present an example workflow in the beginning, and step by step execution process of algo 1-3 should be presented. Especially the tables 1-2 are difficult to follow and hard to see the highlight points. Changes in each step. No steps are introduced earlier. I strongly recommend to rework this section completely to make it readable. The algorithm can be a pseudo codes.

  7. The authors refer to the DATAVIEW framework in another paper. I think the authors should summarize and better explain what the workflow execution framework is. I wonder how this framework is communicating with VM's and how the task execution results are communicated and coordinated. Some important assumptions are made in this paper, i.e. each task is executed by a VM but we do not yet know how the results that may influence another path are communicated or coordinated. Only time and costs are the issue here. Perhaps you could spelll out the simplified assumptions.

  8. There is no baseline comparison to make for the confidential task execution. It increases time wise and costs wise as there are more confidential tasks, for obvious reasons. What does this mean other than it gets jobs done? Does this guarantee the confidentiality? I am a bit lost what the experimental results are proving.

  9. Please refer to relevant papers Vijayalakshmi Atluri, Soon Ae Chun, & Pietro Mazzoleni (2004) Chinese Wall Security for Decentralized Workflow Management Systems, Journal of Computer Security, Volume 12, Number 6, November 2004, pp 799-840

Vijayalakshmi Atluri, Soon Ae Chun, Ravi Mukkamala and Pietro Mazzoleni (2007) A Decentralized Workflow Execution Model for Inter-organizational Workflows, Journal of Distributed and Parallel Databases, Volume 22, Number 1, August 2007: pp 55-83.

Additional Questions:

  1. Which category describes this manuscript?: Technology

How relevant is this manuscript to the readers? Explain under Public Comments: Relevant

  1. Please explain how this manuscript advances this field of research and/or contributes something new to the literature.: The workflow execution on the public cloud can be distributed on different virtual machines, and the proposed framework allows task executions by the VM's in a timely manner and also in a confidential manner.

It claims that this is a first novel approach in the workflow execution in the public cloud that considers the confidentiality issues of some tasks in a workflow.

  1. Is the manuscript technically sound? Please explain your answer under Public Comments below.: Appears to be - but didn't check completely

  2. Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: No

  3. Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: Important references are missing; more references are needed

  4. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Could be improved

  5. How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below.: Poor

  6. Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Difficult to read and understand

  7. Should the supplemental material be included? (Click on the Supplementary Files icon to view files): Does not apply, no supplementary files included

  8. If yes to 6, should it be accepted:

Please rate the manuscript. Explain your choice.: Poor

Reviewer: 3

Recommendation: Author should prepare a major revision

Comments: The authors target an important research question. To help the paper's quality, the following aspects can be improved.

  1. Section 2 could be organized into different subsections: cloud workflow scheduling and scheduling with confidential requirements.

  2. A table summarizing meanings of acronyms/variables like EST, EFT, LFT will help.

  3. What are unique challenges of scheduling tasks to SGX virtual machine? How general is the proposed algorithm? Can the algorithm also be used in other scenarios where some workflow tasks have special VM requirements, such as software package dependency? If so, can the problem be generalized to add one condition: Ti has VM requirement VMj?

  4. The paper targets Big Data Workflow. Which unique Big Data requirements are considered? How does the proposed algorithm deal with the requirements? What are the data sizes and data related costs in the algorithm and experiment?

  5. The paper compares with IC-PCP algorithm extensively. A (short) description of the IC-PCP algorithm and comparison with the proposed algorithm at algorithm level will help readers.

  6. There are some grammar/format errors in the paper. For instance: 5$, 8$ and 13$ should be $5, $8 and $13.

Additional Questions:

  1. Which category describes this manuscript?: Research

How relevant is this manuscript to the readers? Explain under Public Comments: Relevant

  1. Please explain how this manuscript advances this field of research and/or contributes something new to the literature.: The paper addresses the challenge is the protection of the integrity and privacy of tasks. The authors use the SGX as a Trusted Execution Environment to support the integrity and confidentiality of individual workflow tasks. Based on this, we propose a deadline-constrained and SGX-aware workflow scheduling algorithm, called SGX-E2C2D. SGX-E2C2D features several heuristics including exploiting longest critical paths and reuse of extra times in existing virtual machine instances. Our experiments show that SGX-E2C2D outperforms the representative algorithm, IC-PCP, in most cases in monetary cost while satisfying the given user-defined deadline.

  2. Is the manuscript technically sound? Please explain your answer under Public Comments below.: Appears to be - but didn't check completely

  3. Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: Yes

  4. Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: References are sufficient and appropriate

  5. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Could be improved

  6. How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below.: Could be improved

  7. Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Easy to read

  8. Should the supplemental material be included? (Click on the Supplementary Files icon to view files): Does not apply, no supplementary files included

  9. If yes to 6, should it be accepted: As is

Please rate the manuscript. Explain your choice.: Good