Closed gsgalloway closed 7 years ago
I've got the same problem here
FlatRowAdapter instance = new FlatRowAdapter();
FlatRow row = FlatRow.newBuilder().withRowKey(ByteString.copyFromUtf8("key1"))
.addCell("family1", ByteString.copyFrom("qualifier1".getBytes()), 54321L, ByteString.copyFrom("value1".getBytes()))
.build();
Result result = instance.adaptResponse(row);
result.toString(); // throws java.lang.ArrayIndexOutOfBoundsException
Here is the Stacktrace :
java.lang.ArrayIndexOutOfBoundsException: 27495
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1236)
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1195)
at com.google.cloud.bigtable.hbase.adapters.read.RowCell.toString(RowCell.java:234)
at org.apache.hadoop.hbase.client.Result.toString(Result.java:824)
Thanks for finding the problems. We'll take a look at this. FWIW, Pull Requests are heartily welcome.
RowCell.toString()
turns out easy to fix. I'll get a new -SNAPSHOT out today with the fix.
HBaseResultCoder
has 2 problems.
KeyValue
to a FlatRow.Cell
in HBaseResultCoder.encode()
.KeyValue
value is translated to a FlatRow.Cell
, HBaseResultCoder.decode()
will create a RowCell
instead of a KeyValue
. That will still result in "// Input and output Result instances are unequal"I use a Bigtable FlatRow
to encode/decode HBase Result
s, which might be the wrong approach. We'll fix FlatRowAdapter for case 1), but that won't fix 2). I'm going to have to mull over these two issues.
Here is a fix for 1: https://github.com/GoogleCloudPlatform/cloud-bigtable-client/pull/1232
@gsgalloway and @plelevier, I'm curious about why you need this. The Cloud Bigtable version of HBaseResultCoder
was intended to be used with CloudBigtableIO, which will create RowCell
s. I'm curious about your use case. Can you please elaborate on why you need this?
FWIW, There's a new HBase apache beam connector, that pretty much copied what we did. Here's there version of HBaseResultCoder
.
Sure thing. In short, we were attempting to unit test a DoFn
that operates on Result
instances and found our tooling for generating Results
for testing uses KeyValue
and not RowCell
.
In more detail: we ran into this issue while unit testing a DoFn
that computes on Result
instances fetched from CloudBigtableIO and uses the library hbase-object-mapper to map from hbase Results
to POJOs.
This library readily converts Results
using both RowCells
and KeyValues
to POJOs, but when converting from POJOs to Results
it only uses the KeyValue
implementation.
The test that highlighted this issue:
@Test
public void testDoFn() throws Exception {
// Given
// -----
HBaseObjectMapper objectMapper = new HBaseObjectMapper();
OurPOJO givenPojo = new OurPOJO(. . .);
// This Result is created using `KeyValue` and not `RowCell`
Result inputResult = objectMapper.writeValueAsResult(givenPojo);
Pipeline p = TestPipeline.create();
PCollection<Result> inputPCollection = p.apply(Create.of(inputResult).withCoder(HBaseResultCoder.getInstance()));
// When
// -----
PCollection<OurPOJO> output = input.apply(ParDo.of(new ConvertFromHBaseResultToOurPOJO()));
// Then
// -----
DataflowAssert.thatSingleton(output).isEqualTo(givenPojo);
p.run();
}
We were able to work around this issue by coercing the KeyValues
into RowCells
:
@Test
public void testDoFn() throws Exception {
// Given
// -----
HBaseObjectMapper objectMapper = new HBaseObjectMapper();
OurPOJO givenPojo = new OurPOJO(. . .);
// Coerce KeyValues into RowCells for compatibility with HBaseResultCoder
Result inputResult = objectMapper.writeValueAsResult(givenPojo);
Cell inputKeyValue = inputResult.listCells().get(0);
Cell convertedToRowCell = new RowCell(CellUtil.cloneRow(inputKeyValue), CellUtil.cloneFamily(inputKeyValue), CellUtil.cloneQualifier(inputKeyValue), inputKeyValue.getTimestamp(), CellUtil.cloneValue(inputKeyValue));
Result inputResultUsingRowCells = Result.create(new Cell[]{convertedToRowCell});
Pipeline p = TestPipeline.create();
PCollection<Result> inputPCollection = p.apply(Create.of(inputResultUsingRowCells).withCoder(HBaseResultCoder.getInstance()));
// When
// -----
PCollection<OurPOJO> output = input.apply(ParDo.of(new ConvertFromHBaseResultToOurPOJO()));
// Then
// -----
DataflowAssert.thatSingleton(output).isEqualTo(givenPojo);
p.run();
}
To clarify, HBaseResultCoder works just fine in production when it is encoding Result
s that come from CloudBigtableIO, but fails when we attempt to run unit tests with Result
s created otherwise.
While it makes sense that this issue should never affect individuals using the CloudBigtableIO connector, its name and the fact that it only works for certain Result
implementations is a bit surprising.
At the least, it might warrant mention in the HBaseResultCoder
documentation that it is only compatible with Results generated by the CloudBigtableIO connector
Thanks for the tip about Beam's HBaseResultCoder, that certainly removes the need for that kludgy Cell conversion code
The next release will have this fix.
HBaseResultCoder
(in bigtable-hbase-dataflow) works well forResult
instances that use theRowKey
implementation of hbase'sCell
interface, but when given aResult
with hbase'sKeyValue
instead of Bigtable'sRowKey
it corrupts the outputResult
.I couldn't find where exactly the breakdown occurs, but it seems related to the fact that a call to
RowCell.toString()
causes an ArrayIndexOutOfBoundsException.A simple test for
RowCell.toString()
:Stack trace from test failure:
Another test illustrating
HBaseCoder
's incompatibility withKeyValue
The relevant dependencies in our pom.xml: