Closed GoogleCodeExporter closed 9 years ago
I get the symptoms in the past few days. I am using table decorators since
January with no problem. It is a core ingredient of my incremental load ETL. In
the past few days the queries use table decorators return partial results. I
couldn't identify a pattern.
Sample job ID would be: job__ipDIF_VllHu-ppueB6P2g0FrII
I run similar jobs every hour.
Original comment by BISystem...@gmail.com
on 7 Oct 2014 at 6:04
[deleted comment]
Is it better if I post this problem on StackOverflow?
This problem has caused us big problems and we want to know in the future if we
can rely on the table decorators or not. I would also like to know if the
problem that we had can be corrected so that we have all the results (including
the data is missing from October 4th to October 5th).
Original comment by collette...@gmail.com
on 8 Oct 2014 at 1:34
I am disappointed that we didn't have any news yet from the BigQuery team
regarding this issue. It's been over over a week and we still have no news :(
It would be good to have at least someone acknowledge that there was a problem.
Original comment by collette...@gmail.com
on 14 Oct 2014 at 1:19
Thanks for reporting this bug and sorry that we dropped the ball.
We've started looking into the problem and will update this issue when we know
more.
Original comment by cdgarr...@google.com
on 16 Oct 2014 at 12:32
I've had the problem again for data inserted yesterday.
Some data inserted on October 16th, between around 16:00 and 20:00 UTC time is
missing when using the table range decorators.
Here are 2 jobs ids where some data is missing (using table decorators):
job_-xtL4PlIYhNjQ5weMnssvqDmd6U , job_9ASNxqq_swjCd1eMmiQ6SmPpxlQ
and 1 job id where data is correct(without decorators):
job_QbcRwYGbQv0BZdHreQEvRlYh-mM
Original comment by collette...@gmail.com
on 17 Oct 2014 at 6:42
We've found the underlying bug and have a fix in the works. It should go out
in our next release in the next week or two.
Thanks again for bringing this issue to our attention and sorry for any
inconvenience this bug may have caused.
Original comment by dk...@google.com
on 17 Oct 2014 at 8:24
Can you please elaborate more? I would like to understand when and what this
bug affected, so i can fix the derived problems it caused on my side.
Original comment by NadavN.P...@gmail.com
on 19 Oct 2014 at 6:43
I would please like to know if the fix has been delivered? Thanks
Original comment by collette...@gmail.com
on 29 Oct 2014 at 3:35
This bug should now be fixed in production.
The flaw was in some code that interpreted timestamps on internally created
files. Since the files themselves are correct, the fix should be immediate and
cover all existing data.
Original comment by cdgarr...@google.com
on 29 Oct 2014 at 5:39
I am still getting the same sympthoms on data that was created with close
proximity to the queries.
this happened on 30.10, 31.10 1.11...
a sample job i ran just now:
job_kqJ_yZh674kTag4I-ibCPnAMXOw
Original comment by NadavN.P...@gmail.com
on 2 Nov 2014 at 7:56
That job (in #11) appears to be running correctly. It just has a timestamp
range that eliminates some of the data in the table.
When using timestamp range decorators, the timestamps are compared to a commit
time, when data that you loaded is made available as part of the table. In
order to make sure you include the commit time, use a range decorator with a
lower bound that's definitely before the commit time, such as the time you
started your load job, and an upper bound that's definitely after the commit
time, such as a time when you observed the loaded data in the table or >= the
last_modified_time of the table.
Original comment by cdgarr...@google.com
on 3 Nov 2014 at 11:10
the problem kept existing on October 30 - Nov 03. on Nov 04 i didn't experience
the problem again. But i wouldn't relay on 1 day to announce "Fixed" as It
comes and goes since October 4th.
I have a process that runs perfectly since January 2014. The process runs every
hour on HH:30, and queries, using table decorators, another table for the data
that was inserted to it within the past "round" hour. the source table is being
loaded with data using batch uploads every few minutes throughout the day. the
source table contains data of 1 day, and so the target table. Since this
problem started, when i examine the results of the process, i find the target
table to contain only part of the data (varies between 40%-70%).
your suggestion (#12) doesn't help, as i am not interested in querying full
table. i want only one hour each time, and i want 24 iterations of 1 hour to
contain the full 1 day data.
the way to reproduce the problem is as follows:
1. Pick a table with on going batch uploads taht have started within the past
24 hours. make sure that data is still being written to table
2. run the following query:
Select * from
(Select count(*) from [MYDATASET.MYTABLE@<timestamp before started writing to
table>-<timestamp of now>]),
(Select count(*) FROM [MYDATASET.MYTABLE])
3.Notice the different results.
I don't know if it is relevant, but during October many countries had changed
their clocks for day light savings, could it be that the BigQuery servers
clocks went off synch?
Original comment by NadavN.P...@gmail.com
on 5 Nov 2014 at 7:51
Original issue reported on code.google.com by
collette...@gmail.com
on 6 Oct 2014 at 7:30