Open timm opened 7 years ago
AUTHORS: Important. Do NOT reply till all three reviews are here (until then, we will delete your comments)_.
Insert reviewer github id here ==> swan17-reviewer4
The paper discusses the preliminary steps towards building a dataset that will be used for identifying development processes in Travis CI. The main objective of the paper is selecting an appropriate subset of data from the TravisTorrent dataset. To do this, the authors investigate different ways of partitioning the dataset, based on various features such as by project or branch, and by whether there are commits during weekends. The authors also discuss the potential to extend the dataset in order to compose new features.
I was very confused by this paper. First, it lacks a clear problem definition: what is it exactly that the authors want to discover in the TravisTorrent dataset and how will that let them reverse engineer a process? A concrete example would be very helpful to illustrate the use case. The authors often use the word "can" to refer to potential extensions of the work presented in the paper but it is unclear whether that is based on any real patterns already detected in the data or if it is conjectural. The writing is often vague, with many grammatical errors, and many concepts are left undefined or are assumed to be widely known. The authors make the extraordinary claim that they describe how social science methods can be applied to big data, but there is nothing in the paper to support it. The structure of the paper is also very confusing: the last three paragraphs of Section 2 clearly belong in Section 1 as they discuss the problem domain, not Background; the second paragraph of Section 3.3 actually belongs in Background; the Results section does not discuss results but rather a form of Threats to Validity; the Related Work section contains mostly rehashing text from earlier in the paper. The authors also seem to think that [2] is all there needs to be mentioned in terms of empirical research outside Archive Research/Big Data. It is however unclear what they mean by "observational methods" (Ethnographies? Case study research? Their discussion of control group also indicates that they are considering experiments or semi-experiments?). I highly recommend [a] as an introduction. Finally, in Figures 1-3, the vertical axes are not labelled.
[a]Easterbrook, Steve, et al. "Selecting empirical methods for software engineering research." Guide to advanced empirical software engineering. Springer London, 2008. 285-311.
_AUTHORS: Important. Do NOT reply till all three reviews are here (until then, we will delete your comments)_.
Insert reviewer github id here ==>
The paper seeks to recover information about development process from Travis CI data. The focus of the paper is on selecting subsets of the TravisTorrent data set to support further investigation.
I found this paper to be very confusing. I am not clear on what it is trying to accomplish. The discussion of applying social science methods to data sets is not explained clearly or in sufficient depth. Because Sections 1 and 2 do not frame the remaining sections, the paper is difficult to follow. For example, the goal of Section 3 is unclear to me. What is being grouped (into test groups): projects, branches, developers, or something else? Section 4 is also unclear, though what I believe to be its point (that we can link multiple data sets) is well known. Overall, the paper lacks a clear goal or message.
_AUTHORS: Important. Do NOT reply till all three reviews are here (until then, we will delete your comments)_.
Insert reviewer github id here ==> gray-swan
The paper focuses on the extraction and analysis of development process data from TravisCI, as an example of application of observational research methods in software engineering. It first describes different ways of partitioning the data into test and control groups, by considering different characteristics, such as repository structure and day of the week when builds happen. It then discusses how the TravisCI data can be extended by using other sources of information.
Although some interesting insights can be derived from it, this paper is puzzling. At first it seems the paper provides essential steps for analyzing software engineering data from the social sciences perspective (by the way, totally unrelated to the title), but at the end there is nothing different from a regular empirical software engineering methodology. Even the discussion on how to form test and control groups is based on random characteristics without any justification, which might be related to the lack of explicit purpose of the paper. Furthermore, the presented results are more observations about the data set and followed procedure, rather than an analysis on the development process traces. Finally, the paper is screaming for better description of concepts and justification of design decisions.
Just edited the "Reviewer 1" comment.
For this paper, the authors would need to offer a very spirited defense of the current format in order for this to be accepted. In fact, the required changes might be so large that it would almost be another paper.
If the authors feel the same way, they might care to withdraw it at this stage.
For future papers, I would say that while this paper demonstrates good technical competency, it does not demonstrate a clear research motivation, context, and direction. I was looking around my papers for an example of one that shows, early on, such a clear research motivation, direction and context. The paper https://arxiv.org/pdf/1702.07735.pdf "might" be such an example (I say "might" since this one is still under review so no one knows if it will past muster for the internationally community). But assuming it does, please note how the introduction:
Insert reviewer github id here ==> Kangaroo01
The paper describes how to select, extract, enrich and analyse data gained from Travis CI. The main objective of the paper is selecting an appropriate subset of data to be analysed.
The paper itself describes the entire process from extracting data up to analyzing the data and interpreting them. It is hardly to understand the real focus of the research work and thus as well for the paper. The paper mentions a lot of possibilities for research what could be done and what is intended to do, but it seems that none of them was investigated further. The paper should clearly describe and focus on the research work the authors have been doing. This became not clear to me. In general I got the impression that it is more a description of the processes and steps to do data analytics.
Thank you very much for all your very valuable comments.
The paper was produced after we realized that our primary approach to apply social case study methods was not possible due to limited data available in the selected dataset. The paper was reframed to identify the basic characteristics a dataset would need to fulfil to allow application of methods like "Grounded Theory". However, time was short to properly introduce the new ideas and add more founded related research. In fact writing a whole new paper would have been the better idea.
We withdraw the paper for your convenience. Again, thank you for all the helpful comments. We consider working on another, more founded version of the paper.
Finding Traces of the Development Process in Travis CI Data
https://github.com/researchart/swan17/blob/master/pdf/swan2017-submission.pdf
"Next to humans interacting with machines, Software Engineering also relies on humans interacting with other humans. These social interactions are relevant for process analysis and are best evaluated by means of social sciences."
"During software development a huge amount of data is collected. This data can be analyzed by methods from observational research to draw conclusions about the processes that produced the resulting software. The collected data needs to be clustered into test groups before if it can be analyzed using these methods."