vesoft-inc / nebula-flink-connector

Flink Connector for Nebula Graph
49 stars 30 forks source link

fix deserialize bug for nebula NULL data #74

Closed liuxiaocs7 closed 2 years ago

liuxiaocs7 commented 2 years ago

try to fix #73

codecov-commenter commented 2 years ago

Codecov Report

Base: 61.54% // Head: 61.54% // No change to project coverage :thumbsup:

Coverage data is based on head (c757541) compared to base (6bb4a2e). Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #74 +/- ## ========================================= Coverage 61.54% 61.54% Complexity 291 291 ========================================= Files 52 52 Lines 1784 1784 Branches 166 166 ========================================= Hits 1098 1098 Misses 596 596 Partials 90 90 ``` | [Impacted Files](https://codecov.io/gh/vesoft-inc/nebula-flink-connector/pull/74?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vesoft-inc) | Coverage Δ | | |---|---|---| | [...connector/nebula/table/NebulaRowDataConverter.java](https://codecov.io/gh/vesoft-inc/nebula-flink-connector/pull/74/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vesoft-inc#diff-Y29ubmVjdG9yL3NyYy9tYWluL2phdmEvb3JnLmFwYWNoZS5mbGluay9jb25uZWN0b3IvbmVidWxhL3RhYmxlL05lYnVsYVJvd0RhdGFDb252ZXJ0ZXIuamF2YQ==) | `59.52% <0.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vesoft-inc). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vesoft-inc)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

liuxiaocs7 commented 2 years ago

This link shows that the code just convert nebula data value to Long, when nebula data value is NULL, that indeed a question, but it should not be catched here because every field in insert edge is not null.

for (int pos = 0; pos < rowType.getFieldCount(); pos++) {
    ValueWrapper valueWrapper = values.get(pos);
    if (valueWrapper != null) {
        try {
            genericRowData.setField(pos,
                    toInternalConverters[pos].deserialize(valueWrapper));
        } catch (SQLException e) {
            e.printStackTrace();
        }
    } else {
        genericRowData.setField(pos, null);
    }
}

we should use valueWrapper.isNull() to judge null or not rather than != null.

After modify the error, the job results shows:

2> +I[61, 62, 1, aba, abcdefgh, null, 1111, 22222, 6412233, 2019-01-01, 2019-01-01T04:12:12, 435463424, false, 1.2, 1.0, 03:12:12, POINT(1.0 3.0)]

But insert statement are as follows:

INSERT EDGE `friend`(`col1`,`col2`,`col3`,`col4`,`col5`,`col6`,`col7`,`col8`,`col9`,`col10`,`col11`,`col12`,`col13`,`col14`) VALUES 61->62@0: ("aba","abcdefgh",22,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,false,1.2,1.0,time("11:12:12"),ST_GeogFromText("POINT(1 3)")),62->63@0: ("aba","abcdefgh",1,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,false,1.2,1.0,time("11:12:12"),ST_GeogFromText("POINT(1 3)")),63->64@0: ("aba","abcdefgh",1,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,false,1.2,1.0,time("11:12:12"),ST_GeogFromText("POINT(1 3)")),64->65@0: ("aba","abcdefgh",1,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,false,1.2,1.0,time("11:12:12"),ST_GeogFromText("LINESTRING(1 3,2 4)")),65->66@0: ("aba","abcdefgh",1,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,false,1.2,1.0,time("11:12:12"),ST_GeogFromText("LINESTRING(1 3,2 4)")),66->67@0: ("aba","abcdefgh",1,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,false,1.2,1.0,time("11:12:12"),ST_GeogFromText("LINESTRING(1 3,2 4)")),67->68@0: ("李四","abcdefgh",1,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,true,1.2,1.0,time("11:12:12"),ST_GeogFromText("polygon((0 1,1 2,2 3,0 1))")),68->61@0: ("aba","张三",1,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,true,1.2,1.0,time("11:12:12"),ST_GeogFromText("POLYGON((0 1,1 2,2 3,0 1))"))

relative edge:

61->62@0: ("aba","abcdefgh",22,1111,22222,6412233,date("2019-01-01"),datetime("2019-01-01T12:12:12"),435463424,false,1.2,1.0,time("11:12:12"),ST_GeogFromText("POINT(1 3)"))

So why only the int8 field is null

Although now it seems right after to differ graph space name(use flinkSinkInput), it is still confusing.

The most recently job result, and no extra null. image

liuxiaocs7 commented 2 years ago

In AbstractNebulaOutPutFormatITTest, now the logic skip rank.

image

for (int i = 2; i < columns.size(); i++) {
    if (config.get(RANK_ID_INDEX) != i) {
        positions.add(i);
        fields.add(columns.get(i).getName());
    }
}
Nicole00 commented 2 years ago

Hi, I have one question that what's the meaning of 1 after 61 and 62 in

2> +I[61, 62, 1, aba, abcdefgh, null, 1111, 22222, 6412233, 2019-01-01, 2019-01-01T04:12:12, 435463424, false, 1.2, 1.0, 03:12:12, POINT(1.0 3.0)]

it looks like the rank value ,but according to the relative edge data, the rank is 0 and there's no property value is 1.

liuxiaocs7 commented 2 years ago

Hi, I have one question that what's the meaning of 1 after 61 and 62 in

2> +I[61, 62, 1, aba, abcdefgh, null, 1111, 22222, 6412233, 2019-01-01, 2019-01-01T04:12:12, 435463424, false, 1.2, 1.0, 03:12:12, POINT(1.0 3.0)]

it looks like the rank value ,but according to the relative edge data, the rank is 0 and there's no property value is 1.

Sorry, I may not have been clear.

In newest implementation, it shows:

image

the rank id is 0, is consistent with what is written, code:

for (List<String> friend : friends) {
    edges.add(new NebulaEdge(
            friend.get(0), friend.get(1), 0L, friend.subList(2, friend.size())));
}

The previous problem was because AbstractNebulaInputFormatITTest and AbstractNebulaOutputFormatITTest use same graph name flinkSink, in AbstractNebulaOutputFormatITTest's logic, it ignores col3 and regard it as rank id.

image

But why is it possible to insert an edge where the vertex does not exist? In this test file

insert into person values ('89', 'aba', 'abcdefgh', '1', '1111',"
                                + " '22222', '6412233', '2019-01-01', '2019-01-01T12:12:12',"
                                + " '435463424', 'false', '1.2', '1.0', '11:12:12', 'POINT(1 3)')")

insert into friend values ('61', '62', 'aba', 'abcdefgh',"
                                + " '1', '1111', '22222', '6412233', '2019-01-01',"
                                + " '2019-01-01T12:12:12',"
                                + " '435463424', 'false', '1.2', '1.0', '11:12:12', 'POINT(1 3)')")

insert into friend values ('61', '89', 'aba', 'abcdefgh',"
        + " '1', '1111', '22222', '6412233', '2019-01-01',"
        + " '2019-01-01T12:12:12', '435463424', 'false', '1.2', '1.0',"
        + " '11:12:12', 'POINT(1 3)')")
Nicole00 commented 2 years ago

I see, thanks for your explanation.

why is it possible to insert an edge where the vertex does not exist? It's because NebulaGraph allows hanging edge, when there's no vertex, the edge still can be insert successfully.