timm commented 7 years ago

https://github.com/researchart/swan17/blob/master/pdf/SWANAccount.pdf

timm commented 7 years ago

_AUTHORS: Important. Do NOT reply till all three reviews are here (until then, we will delete your comments)_.

Reviewer1

Insert reviewer github id here ==> gray-swan

Recommendation (select one)

[ ] Accept
[X] Accept if (see below)
[ ] Reject

Summary (1 para)

The paper presents three case studies on software development screencasts as an information source in developers' knowledge seeking. The results of the case studies show that (i) developer screencasts present a higher similarity in their frames than other videotypes, which serves in their identification; (ii) it is possible to identify the main screencast topics by using their transcripts; and (iii) developer screencasts can be tied to relevant APIs by leveraging the textual similarity between them and the screencasts' transcripts.

Advocacy (accept since, reject since, 1 para)

The paper presents appealing ideas to handle the large amount of information embedded in developer screencasts, which have been seldom exploited in software engineering research. The ideas are developed in three case studies addressing different questions about the visual and spoken content developer screencasts. Given the promising topic and results, I advocate for the acceptance of the paper once the major issues pointed out below are addressed.

List of "Pros"

Appealing research ideas in a not-so-explored topic
Promising results of three case studies on how to leverage a new source of development data

List of "Cons"

Conclusions are somewhat loose/misleading
Discussion of results lacks of depth (especially in the case of RQ2)
The design of the case studies is suboptimal (but can be easily improved)

Changes needed before I can recommend accept (if any)

s# = section #

Major

s3. To answer RQ2 two analyses are performed, one using screencast titles, the other one using their transcripts. However, there is no comparison between the two analyses, neither discussion on how these analyses complement/overlap. s3-s4. The conclusions (squared boxes) of each part of the study are not very clear and do not seem to answer the original RQs. In particular, RQ2 and RQ3 seem to be answered by providing specific but detached statements from their respective discussions. Also, they are sometimes focused on certain, specific results without any clear justification, and sometimes they go beyond the evidence of the study. For example:

"Testing—a conventional task in so ware development can be learned as long as some help is provided to start testing using Java." => how was this concluded from the topic extraction?
"An advanced search for a development task within software development screencasts might help to perform and follow them also independently of the overall topic of a development screencast." => what is an advance search? what does it mean "to perform and follow them also independently of the overall topic of a development screencast"?
"There exists a similarity threshold of relevant API documents. Above such threshold a high quantity of relevant API documents in relation to other API documents can be found." => What is this threshold and what is the evidence for this statement?
"Extracting the content of the development screencast—e.g., the code showed on the screen when using an IDE—might lead to a higher precision/recall to identify a development screencast" => rather than a conclusion, this is an assumption

s4. Recall is used to assess the identification of relevant APIs to screencasts. However, unless a throughout analysis of all 9,455 was done to build the gold set for each one of the 32 screencasts in Section 4, recall does not seem as an appropriate quality measure in this case. s7. The final conclusions is misleading, as it is currently mixing results from the three case studies, even though they use different study subjects. Specifically, one of the conclusions of the paper is that six main topics exist in the screencasts of Java, and that one of them is the how-to screencasts (which doesn't seem to be the case). Where is how-to listed as a screencast topic?

Need clarification/justification/modification

s2. "The Cosine algorithm is the best algorithm to identify a development screencast from other video types (highest concentration of similarity values)." => the best one within the studied algorithms s3. "Method and system operations are also frequently occurring tasks during development" => in development screencasts s3. "Interestingly, UI operations are also shown to be one of the main activities performed when comprehending software [16]." => This is indeed interesting, but also out-of-topic as program comprehension is not the main issue here. It needs to be pointed out that UI operations are one of the topics within the screencasts. s3. Table 1 can be more informative by adding the frequency of each topic. It is unclear if the topics presented in this table are all the ones found or if other topics were identified. If there are more topics, what is so interesting about these ones? Also, it is unclear what "repeatable tasks" are. s3. If only nouns from transcripts were considered when applying LDA, why Table 1 shows verbs as part of the terms describing a topic? s3. The conclusions of Section 3 are focused on certain topics (database operations and testing) without any particular reason. s4. Why was TaskNav chosen for the study? s4. While the first sentence of the section states that 35 screencasts were used in the study, Figure 5 states that 32 screencasts where used. Which one is it? s4. "This result is more accurate than random, which has a precision below 1%." => Unclear. Can you elaborate on this? s6. The last paragraph is a summary of the work presented in the paper, more than a a discussion of results or limitations of the study. s7. "In this stage, a couple of relevant API documents are provided within a list of 10 items." => which stage?

Typos, grammar errors

s1. "human interactions [9] e.g. " => add comma before e.g. s1. "A task in a development screencast can be assigned to an topic of so ware development" => a topic s2. "We calculated the frame similarity of every video of every video type" => similarity of every video type s2. "Development screencasts seems to be more static" => seem to be s3. "We performed two different analysis about the topics" => analyses s3. "We found that such need is present in development screencasts" => such a need s4. "for every development task that were performed" => that was s4. "the relevant documentation pages were found in the top-10 retrieved position." => positions s4. "the text that might appear in a scene e.g. an IDE" => add comma before e.g. s5. "MacLeoad et al.[17]" => MacLeod s5. "We extend their work by linking screencasts with API documents and show how similar they actually are." => and by showing s6. "we found that frames in a development screencast seems to be very much alike" => seem to be s6. "Leveraging our results, a simple tool..." => By leveraging

timm commented 7 years ago

_AUTHORS: Important. Do NOT reply till all three reviews are here (until then, we will delete your comments)_.

Reviewer2

Insert reviewer github id here ==>

Recommendation (select one)

[ ] Accept
[X ] Accept if (see below)
[ ] Reject

Summary (1 para)

Advocacy (accept since, reject since, 1 para)

I think there are flaws in this paper, namely related to justification. However, I also think this could spark very useful discussion on streamcasts, their usages, and even techniques for good streamcasts. So despite these flaws, I recommend acceptance, however ask for a look at the findings boxes at the end of each section. Some of these should be removed.

List of "Pros"

The paper tackles an interesting idea related to automatically identifying screencasts.
Section 3 is very well justified, and I like some of the suggestions, specifically related to support systems in IDEs for database tasks. Personally, I would find that very useful.

List of "Cons"

While I never like to start with writing, this paper needs to be closely proofread. The paper is at times difficult to read due to typos, misspellings, and frequent poorly structured sentences. It's enough to hinder reading of the paper, and makes it difficult to understand some of the justifications of the research questions.
I think some portions of this paper are poorly justified. The question: "Is it possible to distinctively identify a developer screencasts from other video types based on frame analysis?" seems strange. Why do we need to automatically determine what is and is not a screencast? Is this because you want to retroactively flag all screencasts as screencasts? If so, I think this could be valuable to some limited extent. But this needs to be justified.
I am confused by this sentence: "Despite the variety of categories present, YouTube does not yet offer the possibility to explicitely search for screencasts explaining how to accomplish a certain development task". I think this sentence applies it is difficult to search for specific screencasts, when I have, anecdotally, not had such difficulty.

Section 3 -Figure 4 is too small to read.

"that have enough distance between each other (see Figure 4)." - how much distance is enough
"We found that such need is present in development screencasts, too." This sentence seems out of place. If there are already videos on MySQL, it might be better to say "Database tasks, especially, are common focuses of streamcast videos in Java. Of the X videos we examined, Y were database tasks." -I find this section interesting, however I struggle to understand the figure 4 charts. Further, there is a lack of quantitative discussion of the results in the text itself. The discussion seems anecdotal. I think this could be improved with some actual data being discussed. In the findings at the end, I do not see a justification for "Testing—a conventional task in so!ware development—can be learned as long as some help is provided to start testing using Java." I do not see testing discussed in any meaningful way in the section. The only sentence I see is "Software testing is also covered in software development screencasts." This is hardly meaningful enough to draw such a specific conclusion on how people learn software testing.
• An advanced search for a development task within software development screencasts might help to perform and follow THEM(my capitalization) also independently of the overall topic of a development screencast" - is the capitalized THEM the screecasts or the development task? I do not understand this sentence, or what it contributes to the finding. I do not know if it is the writing, or if it's because it doesn't follow from the previous section, but it needs to be better explained.

Section 4 I like the idea of this section, that is finding relevant API Documents to "attach" to streamcasts. However, given the very low precision of your search technique (>10% on all Top N's), I do not see the usefulness of your current approach. If you attach 10 documents to a screencast, and only one is relevant, that is not helpful to the viewer. While this is still better than random, I would like the authors to be more direct in admitting this limitation. Again, the "findings box" at the end of the section seems poorly justified. Specifically, "Extracting the content of the development screencast .g., the code showed on the screen when using an IDE—might lead to a higher precision/recall to identify a development screencast." Would be great in a Future work or conclusion section. It is, in this context, presented as some kind of finding from the research, when it is more accurately a hypothesis of a future study.

Typos: - I didn't put a large effort into tracking typos, but I try to note the glaring ones that stand out. There is no author section on this paper p1 explicitely = explicitly "However, attaining such tasks" - should be completing such tasks "A task in a development screencast can be assigned to an topic of so ware development" - I cannot

Section 4 - first sentence "miss-spelt" - misspelled

Changes needed before I can recommend accept (if any)

Overall, I think the "findings boxes" at the end of each section need to be pruned to what the studies found in concrete, statistical terms. Further, I think Section 2 needs to be better justified. Finally, I strongly urge or a professional proofread of some kind.

timm commented 7 years ago

_AUTHORS: Important. Do NOT reply till all three reviews are here (until then, we will delete your comments)_.

Reviewer3

anonproton

Recommendation (select one)

[ ] Accept
[X] Accept if (see below)
[ ] Reject

Summary (1 para)

The paper explores the idea of using developer screen-cast as an additional source of information in developer's knowledge seeking activity to answer questions related to development. They found that developer screen-casts have high level of similarity in their frames which can be exploited to identify screen-casts from other videos. The topics of developer screen-casts can be extracted by mining their transcripts which can further be mapped to relevant APIs by calculating textual similarity.

Advocacy (accept since, reject since, 1 para)

I am on the fence for advocating this paper. While on one hand, I find the topic has some technical merit and proposes a simple intuitive solution, but on the other hand I feel the problem lacks clear motivation. Is it really a problem to search for screen-casts on youtube right now? If not, does the the proposed solution, increases the user experience by many folds to justify implementing it? (don't think so, as precision/recall numbers don't seem quite high)

A quick search on youtube with "screencast" prefixed to the "search text" gave me the necessary results I was expecting. Again, I might have overlooked certain scenarios but please justify cases where you think a simple common sense based search approach might not result in the expected outcome.

For me, the actual merit comes from identifying and overcoming challenges in developer's current knowledge seeking exercise. The authors do make some attempt in Section 3. More of that, would help position this work in the right context. Overall, I'd be in a position to appreciate the work much more if the authors focused more on the end-to-end scenario of how one can potentially improve developers' knowledge seeking experience.

List of "Pros"

An interesting new topic

List of "Cons"

Parts of study lack justification/motivation
Unsubstantiated claims that seem more of speculations (not convinced by the findings presented in boxes at the end of each section).
Results are not very impressive. With the current precision/recall, not sure how useful the approach would be without inflicting cognitive burden on the developer to filter out what he/she is looking for.
Writing not at its best. Can easily benefit from the second pass.

Changes needed before I can recommend accept (if any)

Please see "advocacy" section.

obaysal commented 7 years ago

@timm I'd match issue IDs according to our master file. The template otherwise looks good.

timm commented 7 years ago

Authors? Comments? Good responses to the above could lead to acceptance.

SWANAccount commented 7 years ago

Hi Tim,

we will respond on Wednesday.

For sure, we will try to resolve the mentioned issues.

So far, thx for the comments!

Best regards, Authors

Tim Menzies notifications@github.com schrieb am So. 25. Juni 2017 um 17:39:

Authors? Comments? Good responses to the above could lead to acceptance.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/researchart/swan17/issues/2#issuecomment-310909838, or mute the thread https://github.com/notifications/unsubscribe-auth/AbeZGfEMDQ1B9iMUgbQFQEtU9D8-uL-Bks5sHn8igaJpZM4Nvc3B .

SWANAccount commented 7 years ago

Dear reviewers,

again, thank you for your comments. Attached our responses.

gray-swan.txt swan-reviewer-18.txt anonproton.txt

SWANAccount commented 7 years ago

Dear reviewers,

please consider below the latest version of our answers, which are slightly improved.

Based on the call for papers and the notification messages by the chairs, we assume that you do not need to provide the revision of the paper at this stage. Otherwise, please advise us on how to proceed.

Thank you.

@gray-swan gray-swan.txt

@swan-reviewer-18 swan-reviewer-18.txt

@anonproton anonproton.txt

researchart / swan17

Swanaccount: Find, Understand, and Extend Development Screencasts on YouTube #2

Reviewer1

Recommendation (select one)

Summary (1 para)

Advocacy (accept since, reject since, 1 para)

List of "Pros"

List of "Cons"

Changes needed before I can recommend accept (if any)

Major

Need clarification/justification/modification

Typos, grammar errors

Reviewer2

Recommendation (select one)

Summary (1 para)

Advocacy (accept since, reject since, 1 para)

List of "Pros"

List of "Cons"

Changes needed before I can recommend accept (if any)

Reviewer3

Recommendation (select one)

Summary (1 para)

Advocacy (accept since, reject since, 1 para)

List of "Pros"

List of "Cons"

Changes needed before I can recommend accept (if any)