vesoft-inc / nebula-importer

Nebula Graph Importer with Go
Apache License 2.0
90 stars 60 forks source link

[feat]support comment in csv #293

Closed linview closed 9 months ago

linview commented 1 year ago

Introduction Support comment info in importer config file to describe more info about the data

Contents

Scenario#1

During import customer‘s data into test env, the data file(*.yaml) for importer usually contains many columns (>20 col) for ETL workflow, header info cannot describe every columns usage. e.g. A col in header named 'id', has a value '10086', we need to know its type is 'STRING' or 'INT'; comment info could shed some lights.

# id data shall be imported as 'string'
no, id, name
1,101,David
2,102,John

Scenario#2

Sometimes, data files may depend on each others, and import them into nebula as whole. Comment info in data file can explain the relations explicitly. e.g.

# node x, y node's prop 'id' shall apply the same type 'int'
no,id,name,addr
1,101,David,SZ
2,102,John,HZ

# node x, y node's prop 'id' shall apply the same type 'int'
no,id,company,prod
1,101,dji,drone
2,102,hikviz,monitor

Related work

veezhang commented 1 year ago

@linview Hi, the header already supported.

The Comment is already supported in the official csv package, but the importer does not yet provide the configuration item.

    // Comment, if not 0, is the comment character. Lines beginning with the
    // Comment character without preceding whitespace are ignored.
    // With leading whitespace the Comment character becomes part of the
    // field, even if TrimLeadingSpace is true.
    // Comment must be a valid rune and must not be \r, \n,
    // or the Unicode replacement character (0xFFFD).
    // It must also not be equal to Comma.
    Comment rune

# ref: https://github.com/golang/go/blob/release-branch.go1.21/src/encoding/csv/reader.go#L117-L124

Please note that this only supports single characters, such as # and // are not supported.

veezhang commented 1 year ago

@linview Hi, the header already supported.

The Comment is already supported in the official csv package, but the importer does not yet provide the configuration item.

  // Comment, if not 0, is the comment character. Lines beginning with the
  // Comment character without preceding whitespace are ignored.
  // With leading whitespace the Comment character becomes part of the
  // field, even if TrimLeadingSpace is true.
  // Comment must be a valid rune and must not be \r, \n,
  // or the Unicode replacement character (0xFFFD).
  // It must also not be equal to Comma.
  Comment rune

# ref: https://github.com/golang/go/blob/release-branch.go1.21/src/encoding/csv/reader.go#L117-L124

Please note that this only supports single characters, such as # and // are not supported.

@MuYiYong PTAL

MuYiYong commented 11 months ago

@linview Hi, the header already supported. The Comment is already supported in the official csv package, but the importer does not yet provide the configuration item.

    // Comment, if not 0, is the comment character. Lines beginning with the
    // Comment character without preceding whitespace are ignored.
    // With leading whitespace the Comment character becomes part of the
    // field, even if TrimLeadingSpace is true.
    // Comment must be a valid rune and must not be \r, \n,
    // or the Unicode replacement character (0xFFFD).
    // It must also not be equal to Comma.
    Comment rune

# ref: https://github.com/golang/go/blob/release-branch.go1.21/src/encoding/csv/reader.go#L117-L124

Please note that this only supports single characters, such as # and // are not supported.

@MuYiYong PTAL

LGTM. I think supporting # can meet the requirement. It may bring about some work of //conversion to #, but it might not be a lot.

veezhang commented 9 months ago

fixed