ngulam-ai / Sherlock

0 stars 0 forks source link

Data Discrepancies #48

Closed ngulamai closed 6 years ago

ngulamai commented 6 years ago

Hi Alexander, Danil We are starting to see discrepancies in the data that we are storing in Bigquery. Please see below an extract from the last two days from our ad server (which should match data stored in bigquery) Can you please review the data and let us know why the discrepancies? the attached file is actually a cvs file that we change extension to upload into upwork Main discrepancie: we are not seeing conversions in Bigquery, while we actually have 2 Can you please check? Thanks

export.txt

akolchin-MM commented 6 years ago

Yes, I don't see any conversions for 25 and 26 August but I see such records again starting from 27 August.

Since we didn't change anything before and after these dates I can only suppose that something was changed on your side.

It also possible that there is some delay in data sending (so all these conversions from streaming_20180827 actually belong to previous days) but again, it can't be in our part of the solution.

image

I would like to help you but I don't know how else to help in this circumstances.

ngulamai commented 6 years ago

We are seeing a discrepancy between the adserver (the platform that actualy send the hits to bigquery) and Bigquery. can you please debug if there are hits that are not process by your app, disregarded or deleted? Otherwise I cannot understand any discrepancy above 5% to 7% Thanks

akolchin-MM commented 6 years ago

can you please debug if there are hits that are not process by your app, disregarded or deleted? Otherwise I cannot understand any discrepancy above 5% to 7%

I am sorry but I have reviewed all the logs to which I have access once again and I didn't notice any evidence what we somehow lose Hit requests. Everything that was sent to /collect was else added to the BigQuery tables.

The only reason for inconsistency which I can hypothetically suppose is the difference in the used time zones in the different services and other parts of the solution (including your parts). Actually, there is no correct or incorrect time zone but the use of them should b synchronized.

Otherwise, data for one day in one system can be assigned to another day in another system. But even in this case, nothing should be actually lost - just distribute among days differently.

Unfortunately, information provided by you is not enough for me to identify the reason of discrepancies. To be honest, this is even not enough for me to be quite sure that I can see such discrepancies. I suppose that most efficient way can be to provide me access to "the server (the platform that actually sends the hits to bigquery) " in order to I will be able to create my own reports and do other tests by myself to properly reproduce the problem.