murraycu / android-galaxyzoo

This Android app lets you classify Galaxy Zoo subjects. It is available in the Google Play Store: https://play.google.com/store/apps/details?id=com.murrayc.galaxyzoo.app . Try beta versions early here: https://play.google.com/apps/testing/com.murrayc.galaxyzoo.app . See also the iPhone app for Galaxy Zoo: https://github.com/murraycu/ios-galaxyzoo/
GNU General Public License v3.0
11 stars 13 forks source link

Incorrect classifications for goods_full and candels_2epoch #22

Closed willettk closed 8 years ago

willettk commented 9 years ago

I've been looking at the data for both the candels_2epoch and goods_full subjects classified through the Android app. I'm worried that about 1 in 5 classifications seem to be using the wrong workflow; for example, there are about 500 rows that recorded candels_2epoch classifications for a goods_full galaxy, and 500 that did vice versa. We've only seen this behavior in the Android app, not in the main web interface.

Do you know what might be causing this?

murraycu commented 9 years ago

I'll investigate.

None of these are recent, right? All the recent ones should be using sloan_singleband.

willettk commented 9 years ago

Correct; last ones are from April 2015, but seem to be equally spaced throughout that time.

I haven't gotten the sloan_singleband data yet to check on those.

On Thu, Jul 2, 2015 at 4:19 AM Murray Cumming notifications@github.com wrote:

I'll investigate.

None of these are recent, right? All the recent ones should be using sloan_singleband.

— Reply to this email directly or view it on GitHub https://github.com/murraycu/android-galaxyzoo/issues/22#issuecomment-117973060 .

willettk commented 9 years ago

There were a few (13 total) sloan_singleband classifications that also used the wrong workflow. 9 used goods_full, 4 regular sloan.

murraycu commented 9 years ago

In general, is the subject_id unique across all surveys, or just within a survey? And how about the zooniverse_id?

willettk commented 9 years ago

subject_id and zooniverse_id should be both unique across all surveys.

On Thu, Jul 2, 2015 at 3:19 PM Murray Cumming notifications@github.com wrote:

In general, is the subject_id unique across all surveys, or just within a survey? And how about the zooniverse_id?

— Reply to this email directly or view it on GitHub https://github.com/murraycu/android-galaxyzoo/issues/22#issuecomment-118154468 .

murraycu commented 9 years ago

Could you please give me a couple of example zooniverse_ids for sloan_singleband subjects that were classified with the wrong questions?

murraycu commented 9 years ago

@willettk ?

willettk commented 9 years ago

Oops - sorry,@murraycu. Here are a couple of examples:

AGZ0007ua0 - by me (KWillett) on 2015-06-24 14:13:37 UTC

AGZ0007vxd - by an anonymous user on 2015-06-30 20:05:50 UTC

On Tue, Jul 7, 2015 at 9:47 AM Murray Cumming notifications@github.com wrote:

@willettk https://github.com/willettk ?

— Reply to this email directly or view it on GitHub https://github.com/murraycu/android-galaxyzoo/issues/22#issuecomment-119226308 .

murraycu commented 9 years ago

So this sloan_singleband subject had "sloan-" question IDs in its classification? http://www.galaxyzoo.org/#/examine/AGZ0007ua0

murraycu commented 9 years ago

The most recent version (1.49) had some small fixes that just might help with this, though so far I can only guess at possible reasons. Maybe you can keep an eye on the more recent data. Any sloan classifications coming in now would almost certainly have to be using the new version, because only the new version (1.49) can get new subjects from the server now. Approximately 70% of users are on version 1.49 now.

There are small numbers of users with various old versions of the app installed. I guess they could be doing odd things. It would be nice if we could add a field to the classification to show the app version. At the moment we just add the same "user_agent" for all app versions.

willettk commented 9 years ago

Yes - the one above had a single annotation for a smooth galaxy (but none of the subsequent questions in the decision tree, such as what the shape of the galaxy is, anything odd, or whether they wanted to discuss it). The lack of any other annotations is another indication that there's a bug somewhere - we shouldn't record incomplete classifications like that.

    "annotations" : [
        {
            "sloan-0" : "a-1"
        },
        {
            "user_agent" : "murrayc.com-android-galaxyzoo"
        }
    ],
willettk commented 9 years ago

I think the version of the "user_agent" would be a good idea to record. Is that something you can change on the app side, or is that in the database?

murraycu commented 9 years ago

I can change the user_agent for each app version, adding a suffix, but then you'd have to search on the first part of the user_agent rather than the exact string. I can also easily add a user_agent_version string. Maybe @brian-c or @camallen can say what they would prefer. They were involved when we added the user_agent in issue #11 .

murraycu commented 9 years ago

The lack of any other annotations

Yes, that is odd. Is this a general pattern with these bad classifications?

I've made a change that might prevent the strangeness that is happening there: https://github.com/murraycu/android-galaxyzoo/commit/1ec4d87489b1a632baa361cf54a95be6e85a72b5

willettk commented 9 years ago

Yes; pretty much all of them have just one annotation, so they aren't complete classifications.

On Tue, Jul 7, 2015 at 3:57 PM Murray Cumming notifications@github.com wrote:

The lack of any other annotations

Yes, that is odd. Is this a general pattern with these bad classifications?

I've made a change that might prevent the strangeness that is happening there: 1ec4d87 https://github.com/murraycu/android-galaxyzoo/commit/1ec4d87489b1a632baa361cf54a95be6e85a72b5

— Reply to this email directly or view it on GitHub https://github.com/murraycu/android-galaxyzoo/issues/22#issuecomment-119337491 .

camallen commented 9 years ago

@murraycu I think you should modify the existing user agent with a formatted suffix (and a nice delimiter to split on), e.g. Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36

murraycu commented 9 years ago

I'm still not quite sure what caused this, and I can't reproduce it so far, though I have some theories which half make sense. So I've made various changes and added various checks to prevent this from happening and to prevent similarly invalid classifications from being uploaded. That's in version 1.50, which I've just released.

And the user-agent now has, for instance /1.50 at the end, so we can learn more about which app versions are doing this.

@willettk , please keep an eye on this and let me know what arrives on the server side.

willettk commented 9 years ago

Will do. Thanks, @murraycu.

I'll also be in touch shortly to discuss new groups being added to the system.

On Thu, Jul 9, 2015 at 3:39 AM Murray Cumming notifications@github.com wrote:

I'm still not quite sure what caused this, and I can't reproduce it so far, though I have some theories which half make sense. So I've made various changes and added various checks to prevent this from happening and to prevent similarly invalid classifications from being uploaded. That's in version 1.50, which I've just released.

And the user-agent now has, for instance /1.50 at the end, so we can learn more about which app versions are doing this.

@willettk https://github.com/willettk , please keep an eye on this and let me know what arrives on the server side.

— Reply to this email directly or view it on GitHub https://github.com/murraycu/android-galaxyzoo/issues/22#issuecomment-119875318 .

murraycu commented 9 years ago

@willettk Do you still see these problems with more recent classifications, with the recent app versions (looking at the user agent)?

willettk commented 9 years ago

Super rare, but yes - happening occasionally. Example:

db.galaxy_zoo_classifications.findOne({_id:ObjectId("559049932a19957491000072")})

{
    "_id" : ObjectId("559049932a19957491000072"),
    "annotations" : [
        {
            "goods_full-0" : "a-1"
        },
        {
            "user_agent" : "murrayc.com-android-galaxyzoo"
        }
    ],
    "created_at" : ISODate("2015-06-28T19:22:59Z"),
    "project_id" : ObjectId("502a90cd516bcb060c000001"),
    "subject_ids" : [
        ObjectId("5500598269736d5162d80c00")
    ],
    "subjects" : [
        {
            "id" : ObjectId("5500598269736d5162d80c00"),
            "zooniverse_id" : "AGZ0007wbm",
            "location" : {
                "standard" : "http://www.galaxyzoo.org.s3.amazonaws.com/subjects/standard/ci1237659161205145845_standard.jpg",
                "inverted" : "http://www.galaxyzoo.org.s3.amazonaws.com/subjects/inverted/ci1237659161205145845_inverted.jpg",
                "thumbnail" : "http://www.galaxyzoo.org.s3.amazonaws.com/subjects/thumbnail/ci1237659161205145845_thumbnail.jpg",
                "fits" : "http://www.galaxyzoo.org.s3.amazonaws.com/subjects/fits/ci1237659161205145845.fits.fz"
            },
            "coords" : [
                248.455592162979,
                33.3278520379735
            ],
            "metadata" : {
                "counters" : {
                    "feature" : 30,
                    "smooth" : 4,
                    "star" : 0
                }
            }
        }
    ],
    "tutorial" : false,
    "updated_at" : ISODate("2015-06-28T19:22:57.594Z"),
    "user" : {
        "classification" : "feature"
    },
    "user_id" : ObjectId("5528d4d440ead53e450002c8"),
    "workflow_id" : ObjectId("5514521e2f0eef2012000001")
}

for a subject that should only be sloan_singleband.

What should the date cutoff for the fix be? This is from 1 month ago.

murraycu commented 9 years ago

That one has no app version suffix on the user_agent, so it must be from an older version of the app from before I added the extra checks. There can be no date cutoff because we can't force people to upgrade their app. But the user_agent tells you the app version for newer versions.

willettk commented 9 years ago

OK - so should I try again but search for an app suffix on user_agent? Can you give me an example of what that should look like?

murraycu commented 9 years ago

Yes, please. The user agent for the latest version would be murrayc.com-android-galaxyzoo/1.52 , for instance.

willettk commented 9 years ago

I can confirm that there are no workflow remappings for the recent classifications that have a version number on the user agent, but that only measures agreement with other users of the app. I haven't been able to run a full check of whether every recent classification with the app uses the same workflow as everyone else who has classified the galaxy.

On Wed, Jul 29, 2015 at 1:09 PM Murray Cumming notifications@github.com wrote:

Yes, please. The user agent for the latest version would be murrayc.com-android-galaxyzoo/1.52 , for instance.

— Reply to this email directly or view it on GitHub https://github.com/murraycu/android-galaxyzoo/issues/22#issuecomment-126021103 .

murraycu commented 8 years ago

I assume that this is no longer a problem, but it would be nice if you checked once more and reopened this issue if necessary.