Closed jony-lee closed 4 years ago
@weiguoz could you please take a look at this?
@jony-lee You are right, gohive
should handle the error returned by batchFetch().
Further more, the default parameter frame size here
const DEFAULT_MAX_LENGTH = 16384000
should be optional for users.
The DEFAULT_MAX_LENGTH
is configured by another project beltran/gohive and the restriction for a response size looks reasonable.
To solve the error: Incorrect frame size
, I think allowing users to set the BatchSize
is a solution.
https://github.com/sql-machine-learning/gohive/blob/63a00321b51afea0f117f3fc90303e5b97eaa7f9/driver.go#L67
How do you think? @jony-lee @Yancey1989
emmm, BatchSize may not solve my problem. I select from my hive about 40000 rows of data, but the size of it is bigger than the number 16384000. the project beltran/gohive is not very big. we can make a Implement in this project, or I will propose an issue to beltran/gohive for a bigger size. By the way, Is there a lot of scenes for querying large amounts of data from hive with thrift? if the answer is no. I can make a concurrent query just for that case.
emmm, BatchSize may not solve my problem. I select from my hive about 40000 rows of data, but the size of it is bigger than the number 16384000.
16384000/40000 = 410
, if the average size of the row is larger than 410, the error will be thrown out. Hence, I think it will be work fine if the batch size set to the value fulfills the condition of batch * average_size_of_row < 16384000
.
By the way, Is there a lot of scenes for querying large amounts of data from hive with thrift? if the answer is no. I can make a concurrent query just for that case.
In fact, we didn't test in such a case. We are glad to solve this problem so far : )
fine, thanks:)
@jony-lee would you please do some verify about this PR. Or could you share the data so that I can reproduce it?
do you mean reproducing the bug? well, I can not give you my data. You can select * from all your hive data as it is big enough, or build a select data set for reproducing bug. It is not necessary to do that, because the error is raised from here. You may know it clearly what happened when the problem occurs.
https://github.com/sql-machine-learning/gohive/blob/develop/rows.go line 54 I got an error
Incorrect frame size (18583244)
while I send a request for a big frame size data. I think the functionr.batchFetch()
should raise that error while it doesn't do this. Further more, the default parameter frame size hereconst DEFAULT_MAX_LENGTH = 16384000
should be optional for users.