Closed fangxlmr closed 1 year ago
Which option is used to control the batchSize (in terms of rpc proto) ? caching or number_of_rows? Or other fields? What value it defaults to?
NumberOfRows
: this is how many rows to fetch in each scan requests -> https://github.com/tsuna/gohbase/blob/master/hrpc/scan.go#L291
MaxResultSize
: this is how many bytes of data to fetch in each scan requests (takes priority over NumberOfRows) -> https://github.com/tsuna/gohbase/blob/master/hrpc/scan.go#L309
Default: https://github.com/tsuna/gohbase/blob/master/hrpc/scan.go#L25-L30
batchSize
is not used by GoHBase. Also, GoHBase doesn't have a in-memory cache for scanner, thus why we don't use the caching
field from the proto definition. To be honest, I'm not so sure why this is even in the proto of HBase, since it doesn't seems to change any behaviour on the server, only the client seem impacted by it.
Depending on your use case, data shape and application, you could use AllowPartialResults
https://github.com/tsuna/gohbase/blob/master/hrpc/scan.go#L324
Which option is used to control the batchSize (in terms of rpc proto) ? caching or number_of_rows? Or other fields? What value it defaults to?
NumberOfRows
: this is how many rows to fetch in each scan requests -> https://github.com/tsuna/gohbase/blob/master/hrpc/scan.go#L291MaxResultSize
: this is how many bytes of data to fetch in each scan requests (takes priority over NumberOfRows) -> https://github.com/tsuna/gohbase/blob/master/hrpc/scan.go#L309 Default: https://github.com/tsuna/gohbase/blob/master/hrpc/scan.go#L25-L30
Right, thanks for the explanation.
To be honest, I'm not so sure why this is even in the proto of HBase, since it doesn't seems to change any behaviour on the server, only the client seem impacted by it.
Agreed. I find no processing logic on server side to handle caching
. Or maybe we just missed them.
Hi all, I encountered an performance issue of range-scanning hbase using
gohbase
.Say, I will
range-scan
HBase using default options generated byhrpc.NewScanRangeStr
, and say it will reponse 1000 in total. Later on I find out it too slow so I decided to tune some config option (e.g.batchSize
), which means tune the number of returned results for eachrpc
call. But I'm unable to find out the option for this.And I dig something else related:
hbase.client.scanner.caching
(defaults to max.Int32) when client send the query to server. This option is most likely to be the one I want, but it obviously already is the max number.In
gohbase
,caching
field is set in the client scan proto, but ignored when init the rpc request. So thescan
request is sent leavingcaching
empty. And no filter are available in pkg filter.HBase alse have an config option called
Caching
(defaults to 1) as well, which is used to controlling returning rows when query RegionServerIn HBase codebase, I find the code snippet which handles the
scan
operation. It seems likeclient caching
is not enforcing, whereasnumber_of_rows
is controlling thebatchSize
? Is it correct? (An also it defaults to max int32 ingohbase
as well).My questions are:
batchSize
(in terms of rpc proto) ?caching
ornumber_of_rows
? Or other fields?filter
aboutbatchSize
?batchSize
will improve the performance? Or do we have a better plan to address this issue?