Open vazexqi opened 13 years ago
Nick and I discussed this issue. We think that excluding some projects from monitoring is problematic because our data will no longer be comparable to the data collected from other sources such as UDC. To address the privacy issue, we'll make it clear that the user should not add private projects to the watched workspace. And, to make sure we don't reveal the identities of the participants or any sensitive data, we won't make the code snippets or the data collected by codingtracker public. And, if we decide to publish such data in our reports, we'll get participants' approval if we think it's going to reveal some private information.
We should confirm whatever we are going to do with IRB as well. They will provide us with the best advice on how to protect people's data.
@Wanderer777, @vazexqi: Our recruitment experience have shown us that a lot of open source developers work on open source and closed source projects at the same time. And, apparently, splitting the workspace isn't convenient as the developers will have to constantly switch between the two instances of Eclipse. Therefore, I've reopened this issue to find alternative solutions. I also talked to Ralph about this issue and will summarize our discussion in the following.
It would be good to ask for user's confirmation before uploading CodingSpectator data. Currently, CodingSpectator just asks for user's authentication information to upload the data. It would be nice to show a list of projects that CodingSpectator has collected data from and ask the user to select the projects whose data he/she would like to submit to CodingSpectator servers. It would be nice to allow the user to permanently exclude some projects from being monitored by CodingSpectator.
There are two ways to stop submitting data from certain projects.
@reprogrammer, @Wanderer777
In my opinion, filtering might not be a viable solution. You need to show the user everything. This creates a user interface problem. How can you present this overwhelming amount of data in a comprehensible manner? UDC itself has a filter view that fails from a UI perspective -- too much information without any way to really make sense of it.
Moreover, when would you present this data? I thought you wanted to not bias their behavior. So would you present this at the end of the study when all data has been collected?
The hardest part is that the the user has go through all of it to filter. For sequential data like CodingTracker, you might end up breaking the sequence. For non-sequential data, you might end up losing non-sensitive data because the user was overwhelmed with all the information and just decide to filter everything.
@Wanderer777, @vazexqi:
Sorry for not describing what I meant by filtering well. By filtering, I meant filtering the projects and not the detailed data. That is, CodingSpectator would present a list of projects to the user to choose from. Then, it will submit data just the from the projects that the user has selected.
During an interview with one of our participants, the issue of selectively turning off monitoring for projects came up. I think this is a limitation that we have to address -- we can't just monitor all projects in the current workspace. Developers might work on different projects (or a combination or open source and proprietary code e.g. test/sample in the same workspace).
These are the steps that we can follow:
I think this is important before we can ask developers to use our project especially since we are capturing fine-grained code snippets and edits. The other user studies (UDC, Mylyn) only captured information that are not as detailed or revealing; thus they cannot compromise the project that is being worked on.