sftcd / tek_transparency

Some measurements of deployments of apps using the Google/Apple Exposure Notification system
MIT License
9 stars 4 forks source link

CZ TEK counts: 7 instead of one on 14-10-2020 #19

Open damaarten opened 3 years ago

damaarten commented 3 years ago

In the CZ CSV file, there are 7 entries for october 14th. There should only be one entry, of course. https://down.dsg.cs.tcd.ie/tact/tek-counts/cz-tek-times.csv I tried to get your code running on my Ubuntu laptop to see if I could fix it, but I ran into a protobuf compiler error and I gave up after a while. The reason I ran into this is: I am using your data (the CSV files) to get a total TEK count for all participating EU countries, to get a feel of the total TEKs published daily by EU countries. I combine your CSV files into one big one. There I noticed this duplicate entry bug, which messes up the totals on that particular day.

sftcd commented 3 years ago

On 05/11/2020 22:22, damaarten wrote:

In the CZ CSV file, there are 7 entries for october 14th. There should only be one entry, of course.

That does appear pretty buggy yep:-) Will check it out.

S

https://down.dsg.cs.tcd.ie/tact/tek-counts/cz-tek-times.csv I tried to get your code running on my Ubuntu laptop to see if I could fix it, but I ran into a protobuf compiler error and I gave up after a while. The reason I ran into this is: I am using your data (the CSV files) to get a total TEK count for all participating EU countries, to get a feel of the total TEKs published daily by EU countries. I combine your CSV files into one big one. There I noticed this duplicate entry bug, which messes up the totals on that particular day.

sftcd commented 3 years ago

On 05/11/2020 22:22, damaarten wrote:

In the CZ CSV file, there are 7 entries for october 14th. There should only be one entry, of course. https://down.dsg.cs.tcd.ie/tact/tek-counts/cz-tek-times.csv I tried to get your code running on my Ubuntu laptop to see if I could fix it, but I ran into a protobuf compiler error and I gave up after a while. The reason I ran into this is: I am using your data (the CSV files) to get a total TEK count for all participating EU countries, to get a feel of the total TEKs published daily by EU countries. I combine your CSV files into one big one. There I noticed this duplicate entry bug, which messes up the totals on that particular day.

Looks like that's caused by zip files that contain a TEK epoch with an odd time, e.g. there's one .cz TEK that has the epoch value 2671153 that maps to Oct 14 16:10 UTC.

End result is that a call to uniq does the wrong thing for the set of TEKs with odd epoch values for that day.

Shouldn't be hard to fix, figuring that out and testing now.

S.

sftcd commented 3 years ago

I've pushed what I think is a fix for that now. We'll see in the morning if that's correct or breaks something else;-)

Be interested if we could fix the protobuf problem you had though so you could run things locally as well. Be happy if you'd mail me wrt that so we can figure it out too.

S.

damaarten commented 3 years ago

That odd epoch value could be caused by all sorts of scenarios: a bug in the CZ system, or someone with a manipulated phone or other device who sent in this key, and so on. I don't have much time to help out, unfortunately. Can you have your students work on this project?

damaarten commented 3 years ago

When running, I get:

(tek_transparency) myprompt:~/tek_transparency$ ./tek_survey.sh 
======================
======================
======================
Running ./tek_survey.sh at 20201106-065420
======================
.ie TEKs
Skipping .ie because refreshToken access failed at 20201106-065420.</p>
======================
Northern Ireland TEKs
Skipping Northern Ireland because refreshToken access failed at 20201106-065420.</p>
Bottom: 231, Top: 337
======================
.it TEKs
Traceback (most recent call last):
  File "/home/maarten/tek_transparency/tek_file_decode.py", line 4, in <module>
    import TemporaryExposureKeyExport_pb2
  File "/home/maarten/tek_transparency/TemporaryExposureKeyExport_pb2.py", line 21, in <module>
    serialized_pb=b'\n TemporaryExposureKeyExport.proto\"\xd1\x01\n\x1aTemporaryExposureKeyExport\x12\x17\n\x0fstart_timestamp\x18\x01 \x01(\x06\x12\x15\n\rend_timestamp\x18\x02 \x01(\x06\x12\x0e\n\x06region\x18\x03 \x01(\t\x12\x11\n\tbatch_num\x18\x04 \x01(\x05\x12\x12\n\nbatch_size\x18\x05 \x01(\x05\x12\'\n\x0fsignature_infos\x18\x06 \x03(\x0b\x32\x0e.SignatureInfo\x12#\n\x04keys\x18\x07 \x03(\x0b\x32\x15.TemporaryExposureKey\"\x9b\x01\n\rSignatureInfo\x12\x15\n\rapp_bundle_id\x18\x01 \x01(\t\x12\x17\n\x0f\x61ndroid_package\x18\x02 \x01(\t\x12 \n\x18verification_key_version\x18\x03 \x01(\t\x12\x1b\n\x13verification_key_id\x18\x04 \x01(\t\x12\x1b\n\x13signature_algorithm\x18\x05 \x01(\t\"\x8d\x01\n\x14TemporaryExposureKey\x12\x10\n\x08key_data\x18\x01 \x01(\x0c\x12\x1f\n\x17transmission_risk_level\x18\x02 \x01(\x05\x12%\n\x1drolling_start_interval_number\x18\x03 \x01(\x05\x12\x1b\n\x0erolling_period\x18\x04 \x01(\x05:\x03\x31\x34\x34'
TypeError: __new__() got an unexpected keyword argument 'serialized_options'
Traceback (most recent call last):
  File "/home/maarten/tek_transparency/tek_file_decode.py", line 4, in <module>
    import TemporaryExposureKeyExport_pb2
  File "/home/maarten/tek_transparency/TemporaryExposureKeyExport_pb2.py", line 21, in <module>
    serialized_pb=b'\n TemporaryExposureKeyExport.proto\"\xd1\x01\n\x1aTemporaryExposureKeyExport\x12\x17\n\x0fstart_timestamp\x18\x01 \x01(\x06\x12\x15\n\rend_timestamp\x18\x02 \x01(\x06\x12\x0e\n\x06region\x18\x03 \x01(\t\x12\x11\n\tbatch_num\x18\x04 \x01(\x05\x12\x12\n\nbatch_size\x18\x05 \x01(\x05\x12\'\n\x0fsignature_infos\x18\x06 \x03(\x0b\x32\x0e.SignatureInfo\x12#\n\x04keys\x18\x07 \x03(\x0b\x32\x15.TemporaryExposureKey\"\x9b\x01\n\rSignatureInfo\x12\x15\n\rapp_bundle_id\x18\x01 \x01(\t\x12\x17\n\x0f\x61ndroid_package\x18\x02 \x01(\t\x12 \n\x18verification_key_version\x18\x03 \x01(\t\x12\x1b\n\x13verification_key_id\x18\x04 \x01(\t\x12\x1b\n\x13signature_algorithm\x18\x05 \x01(\t\"\x8d\x01\n\x14TemporaryExposureKey\x12\x10\n\x08key_data\x18\x01 \x01(\x0c\x12\x1f\n\x17transmission_risk_level\x18\x02 \x01(\x05\x12%\n\x1drolling_start_interval_number\x18\x03 \x01(\x05\x12\x1b\n\x0erolling_period\x18\x04 \x01(\x05:\x03\x31\x34\x34'
TypeError: __new__() got an unexpected keyword argument 'serialized_options'
etc.

protoc --version gives: libprotoc 3.7.1

I googled and tried some suggestions, but I could not resolve the error.