Many-to-many relationship for classes and campaigns

jeroen commented 12 years ago

For the Snack dashboard I exported the snack deployment data to Mongo. One of the reasons is that is hard to use live data from ohmage is because the data is split over 20 campaigns/classes. So you need to make like 20 calls to survey_response/read to get the data.

At the moment there is no notion of deploying a campaign/survey to multiple classes, is that correct? Or is this just something we did for the deployment? It is a bit strange that exactly the same campaign (snack.xml) has been deployed 20 times and I have to manually combine the data.

jojenki commented 12 years ago

A campaign can be linked to any number of classes. The campaign/update has a "class_list_add" parameter that can add any number of classes to a campaign; likewise, there is a "class_list_remove" parameter.

However, the classes should be irrelevant when reading data from a campaign. The only way classes come into play is to determine your privileges. If you are a regular user in a class, then you are most likely an analyst in the campaign. This means that you can view all of the public data in the campaign. If you are a privileged user in a class, then you are most likely a supervisor in the campaign. This means that you can view all of the data, public and private, in the campaign. The only reason you should have to make more than one call is for paging.

However, the snack campaign was a special case because the XML was renamed, but otherwise unchanged, and used to create a new campaign that collects the exact same data. In a real-world scenario, this should probably never happen and, if it did, would be the product of two independent people that just so happened to create the same XML; in this case, the division of information is entirely desired.

We only made about 3 copies of the snack campaign, so why do you need to make more than 3 calls?

jeroen commented 12 years ago

So when I list the active campaigns, I see:

campaign                                                     running_state
urn:campaign:lausd:SouthGate:SP2012:ECS_P3:Advertisement           running
urn:campaign:lausd:SouthGate:SP2012:ECS_P2:Advertisement           running
urn:campaign:lausd:Hollywood:SP2012:ECS_P4:Advertisement           running
urn:campaign:lausd:Jefferson:SP2012:ECS_P6:Snack                   running
urn:campaign:lausd:LAHigh:SP5012:ECS_P5:Snack                      running
urn:campaign:lausd:LAHigh:SP5012:ECS_P5:Advertisement              running
urn:campaign:lausd:SouthGate:SP2012:ECS_P3:Snack                   running
urn:campaign:lausd:Jefferson:SP2012:ECS_P5:Snack                   running
urn:campaign:lausd:Roosevelt:SP2012:ECS_P8:Advertisement           running
urn:campaign:lausd:SouthGate:SP2012:ECS_P4:Advertisement           running
urn:campaign:lausd:SouthGate:SP2012:ECS_P1:Snack                   running
urn:campaign:lausd:Roosevelt:SP2012:ECS_P3:Advertisement           running
urn:campaign:lausd:Jefferson:SP2012:ECS_P6:Advertisement           stopped
urn:campaign:lausd:Roosevelt:SP2012:ECS_P7:Advertisement           running
urn:campaign:lausd:UCLACommunity:SP2012:ECS_P6:Advertisement       running
urn:campaign:lausd:Roybal:SP2012:ECS_P1:Advertisement              running
urn:campaign:lausd:Jefferson:SP2012:ECS_P5:Advertisement           stopped
urn:campaign:lausd:SouthGate:SP2012:ECS_P4:Snack                   running
urn:campaign:lausd:SouthGate:SP2012:ECS_P2:Snack                   running
urn:campaign:lausd:Roybal:SP2012:ECS_P1:Snack                      running
urn:campaign:lausd:Roosevelt:SP2012:ECS_P7:Snack                   running
urn:campaign:lausd:Hollywood:SP2012:ECS_P4:Snack                   running
urn:campaign:lausd:Roosevelt:SP2012:ECS_P3:Snack                   running
urn:campaign:lausd:SouthGate:SP2012:ECS_P1:Advertisement           running
urn:campaign:lausd:UCLACommunity:SP2012:ECS_P6:Snack               running
urn:campaign:lausd:Roosevelt:SP2012:ECS_P8:Snack                   running

As far as I know; the only way to get the full dataset is doing a seperate oh.survey_response.read for all of these campaigns, and then manually merging the data, and hope that no changes were actually made in the campaigns.

jojenki commented 12 years ago

Got it. I hadn't realized that it had been duplicated so many times. You are correct about needing to make so many calls.

This is a feature of an upcoming release (not sure exactly which one), and it will have a much better separation of users and data. So, at that point, this will be completely unnecessary. For now, however, you are stuck with making multiple calls.

But, the many-to-many relationship of campaigns to classes does exist. So, is this issue more of a process issue than a technical one?

jeroen commented 12 years ago

I guess. I assumed that the feature wasn't present because why else would we re-upload the same campaign 25 times? But if the feature is already there you can close the issue.

I just wanted to make sure we realize that the current way of deploying makes it hard to create a live dashboard because the data is fragmented over all these campaigns, and there is no way of guaranteeing that different campaigns are in fact, identical.

@hongsudt

jojenki commented 12 years ago

The problem is that the permissions on the data aren't linked to classes the way you would expect. If you upload data for one campaign while being a member of one class, then someone in a different class but belonging to the same campaign can see that data as well. For the Mobilize use-case, this was a problem because, while all of the students were participating in the same campaign, they didn't want their data to bleed together. That's why multiple copies of the same campaign were created, one for each class in each school.

The solutions exist, but only on the drawing board. And they cannot be implemented while we are working on the showcase. We can keep this issue open, but it already exists as a series of more-specific issues in future milestones.

hongsudt commented 12 years ago

John is right about the requirement for Mobilize e.g. while the data gathering is happening, each class only cares about the data from their own classroom. Ideally, we would like each teacher to make the modification and create their own campaigns. We are trying to go for this approach this up-coming year.. So i would say what happen in Mobilize deployment 2012 should be an exception.

If the goal is to have all students share their data across classrooms, we can do that by creating one campaign and associate it with multiple classes. However, this wasn't the use case..

jeroen commented 12 years ago

I know why we did what we did, but I think that we are not being true to our own use case. Uploading the same campaign.xml 25 times because otherwise the ACL is not functioning correctly feels like a hack to me. Maybe we should learn from this, because what is in our mind 'ideally' is not what is happening. In fact I think what has really happened will be a far more common use case.

So the actual use case is that we have one campaign, which we deploy over many classes. But in a way that users within a class, users can initially only read responses from other users in the same same class. Yet have a way for privileged users to get survey responses for all classes.

jeroen commented 12 years ago

Actually once you think about it, it makes a lot of sense. We could have a 'campaign repository' of the server, so that classes can easily pick an existing one to deploy in their class (if they don't want to create a custom one). That will open a lot of opportunities for mass mobilization of a certain campaign, where different classes can subscribe to the same campaign. Eventually they can share data with each other, which could be a lot of fun.

jojenki commented 12 years ago

Agreed. There was a similar scenario where a group of clinicians worked for the same group and had a "standard" campaign for a scenario (i.e. the "Diabetes Management Campaign"), so each doctor wanted to be able to "prescribe" the campaign without sharing data with their colleagues (for HIPPA reasons, maybe).

I didn't want to reiterate what the new design is, but it will be something along the lines of:

https://github.com/cens/ohmageServer/issues/359#issuecomment-6916578

That should satisfy the requirements by renaming some concepts and sharing data with groups of users or specific people as opposed to the broader level of sharing we have now.

jeroen commented 12 years ago

Talked a bit about this with Josh yesterday. How about we simply introduce a third privacy_state "public". In such a way that:

privacy_state = "private" : visible to user, teacher. privacy_state = "shared" : visible to user, teacher and users within the same class. privacy_state = "public" : visible to everyone.

This way we also move towards a system for 'opening up' data.

jojenki commented 12 years ago

So, when talking about visibility of survey responses, only the campaign's roles matter. Class roles are only used to setup the user. When a user is added to an existing class, its class role is used to assign it campaign roles for all of the campaigns associated with a class; when a campaign is added to a class, all of the user's respective class roles are used to assign each of them campaign roles for the new campaign.

When performing reads on the survey response data, the class role is never examined; everything relies on those campaign roles. I am not sure how this fits into your solution; I just want to mention that you should not think about survey response visibility and class role, together.

jshslsky commented 12 years ago

For this issue, a simple fix may be just to identify a class as strict. If a class is strict, reads of survey response data should not cross the class boundary. The default setting for classes can be lenient, which will allow the system to behave as it currently does.

This solution would make reads of survey response data a bit more complicated, but it's nice because if a class decided to open up its data it would be a simple toggle from lenient to strict.

jojenki commented 12 years ago

So, to be clear, we are removing the requirement of a campaign ID in survey_response/read? The "problem" is that there will be no limit on the data. Sure, if you are participating in only the campaigns above then you are fine, but if you are participating in other campaigns all of that data will be returned as well.

I am ok with removing the campaign_id parameter. I propose we also add a "campaign_id_list" parameter that is a list of campaign IDs. That would fix this problem, correct?

I feel like classes are a red herring, here.

jshslsky commented 12 years ago

I think we still need campaign id, but now the requesting user's class memberships will have to be checked against the survey response owner's class memberships. For strict classes, if the requester and survey response owner do not share membership in that class, then the requester cannot see those survey responses.

I don't think classes are a red herring because they really do describe a group of users.

jojenki commented 12 years ago

As I understand the issue, it has nothing to do with classes. It is simply the case where a user wants to be able to read survey responses from multiple campaigns. If we keep the campaign ID parameter, then how is this possible?

And, I take back what I said before about having a campaign ID list parameter. The data output would be entirely too variable and complex.

I feel like this issue is being confused with a similar, but different, issue. That issue is the need to have a single campaign and multiple groups of users. When users upload, only the "leader" of that group is able to see the data. Classes don't do this. They are similar, but this requires a different level of sharing outside the current sharing states we have on campaigns. We need a way for users to say, "I am sharing this data with X group." Using classes for this purpose, seems to overload an already complex concept.

jshslsky commented 12 years ago

Yes, I think I overloaded this thread with issues that were outside of the scope of the original issue.

Jeroen, for the snack dashboard, I think we have to treat it as a special case where you have to query every class that had the identical campaign. As it exists today, our system does not prevent users from sharing the same campaign across multiple classes. For the mobilize data, the classes and campaigns were just not configured that way.

John, I think survey_response/read would have to be modified to work with a campaign id parameter where an end user only wants to see data for a specific campaign, but also without a campaign id parameter where an end user wants to see all of the responses regardless of campaign. My strict/lenient class filter would be for the latter case.

ohmage / server

Many-to-many relationship for classes and campaigns #391