sql-machine-learning / gohive

A Go driver for Hive
55 stars 27 forks source link

Incorrect frame size (18583244) #57

Closed jony-lee closed 4 years ago

jony-lee commented 4 years ago

https://github.com/sql-machine-learning/gohive/blob/develop/rows.go line 54 I got an error Incorrect frame size (18583244)while I send a request for a big frame size data. I think the function r.batchFetch() should raise that error while it doesn't do this. Further more, the default parameter frame size here const DEFAULT_MAX_LENGTH = 16384000 should be optional for users.

typhoonzero commented 4 years ago

@weiguoz could you please take a look at this?

weiguoz commented 4 years ago

@jony-lee You are right, gohive should handle the error returned by batchFetch().

weiguoz commented 4 years ago

Further more, the default parameter frame size here const DEFAULT_MAX_LENGTH = 16384000 should be optional for users.

The DEFAULT_MAX_LENGTH is configured by another project beltran/gohive and the restriction for a response size looks reasonable. To solve the error: Incorrect frame size, I think allowing users to set the BatchSize is a solution. https://github.com/sql-machine-learning/gohive/blob/63a00321b51afea0f117f3fc90303e5b97eaa7f9/driver.go#L67 How do you think? @jony-lee @Yancey1989

jony-lee commented 4 years ago

emmm, BatchSize may not solve my problem. I select from my hive about 40000 rows of data, but the size of it is bigger than the number 16384000. the project beltran/gohive is not very big. we can make a Implement in this project, or I will propose an issue to beltran/gohive for a bigger size. By the way, Is there a lot of scenes for querying large amounts of data from hive with thrift? if the answer is no. I can make a concurrent query just for that case.

weiguoz commented 4 years ago

emmm, BatchSize may not solve my problem. I select from my hive about 40000 rows of data, but the size of it is bigger than the number 16384000.

16384000/40000 = 410, if the average size of the row is larger than 410, the error will be thrown out. Hence, I think it will be work fine if the batch size set to the value fulfills the condition of batch * average_size_of_row < 16384000.

By the way, Is there a lot of scenes for querying large amounts of data from hive with thrift? if the answer is no. I can make a concurrent query just for that case.

In fact, we didn't test in such a case. We are glad to solve this problem so far : )

jony-lee commented 4 years ago

fine, thanks:)

weiguoz commented 4 years ago

@jony-lee would you please do some verify about this PR. Or could you share the data so that I can reproduce it?

jony-lee commented 4 years ago

do you mean reproducing the bug? well, I can not give you my data. You can select * from all your hive data as it is big enough, or build a select data set for reproducing bug. It is not necessary to do that, because the error is raised from here. You may know it clearly what happened when the problem occurs.