zooniverse / Galaxy-Zoo

http://www.galaxyzoo.org
Apache License 2.0
30 stars 29 forks source link

Exact timestamp for same user with different subjects #201

Open avis1234 opened 9 years ago

avis1234 commented 9 years ago

Integrating with galaxy_zoo stream. Some events arrive with same user_id, same created_at and different subjects. Logging this issue per our discussion with your team.

chrissnyder commented 9 years ago

Are the annotations within the classifications the same?

willettk commented 9 years ago

According to @parrish, no. Here are the annotations for three classifications with same timestamp and user, but different subjects and annotations.

[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-2"}, {"sloan_singleband-11"=>"a-1"}]
[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-1"}, {"sloan_singleband-1"=>"a-1"}, {"sloan_singleband-2"=>"a-1"}, {"sloan_singleband-3"=>"a-1"}, {"sloan_singleband-4"=>"a-3"}, {"sloan_singleband-5"=>"a-1"}, {"sloan_singleband-11"=>"a-1"}]
[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-2"}, {"sloan_singleband-11"=>"a-1"}]
parrish commented 9 years ago

Unfortunately, this is unavoidable. When the API receives a classification, it timestamps it immediately. The timestamps you're seeing in the data are set when the classification is created.

Some common scenarios that cause this:

A mobile user, or a user on flaky network connection (very common)

Or in times of unusually high traffic (less common)

The only way to approach this is to have the client timestamp the classifications before they are sent. The caveat here is that there are no guarantees on what the client system clock is set to.

I suppose you could try to calculate a client local time offset by comparing it to a response from the server and adjusting for network latency, but that's pretty far from reliable.

In a nutshell, you could figure out the order that requests are sent in, but not the actual time the request is sent.