vesoft-inc / nebula-importer

Nebula Graph Importer with Go
Apache License 2.0
90 stars 60 forks source link

CSV Data import error with timestamp data type #140

Closed JArma19 closed 3 years ago

JArma19 commented 3 years ago

I'm trying to import data in csv format to nebula graph using nebula-importer tool. This is the schema for tag user:

CREATE TAG user(id int, screen_name string, followers_count int, friends_count int, created_at timestamp);

And here are two rows of my csv file to make understand:

"0",":User","2011-09-13T15:13:20","372861228","danieleverdi",,, "2",":User","2020-06-02T14:52:27","1267831404525690880","mariorsossi",,,

The problem is related to the timestamp string format: as a matter of fact I get the following error:

2021/05/11 21:54:43 --- START OF NEBULA IMPORTER --- 2021/05/11 21:54:43 [WARN] config.go:217: Invalid retry option in clientSettings.retry, reset to 1 2021/05/11 21:54:43 [WARN] config.go:168: You have not configured whether to remove generated temporary files, reset to default value. removeTempFiles: false 2021/05/11 21:54:43 [INFO] connection_pool.go:74: [nebula-clients] connection pool is initialized successfully 2021/05/11 21:54:43 [INFO] clientmgr.go:28: Create 10 Nebula Graph clients 2021/05/11 21:54:43 [INFO] reader.go:64: Start to read file(0): /home/justin/Desktop/progetto_dm/nebula-docker-compose/users.csv, schema: < :IGNORE,:IGNORE,user.created_at:timestamp,:VID(int)/user.id:int,user.screen_name:string,user.followers_count:int,user.friends_count:int > 2021/05/11 21:54:43 [INFO] reader.go:180: Total lines of file(/home/justin/Desktop/progetto_dm/nebula-docker-compose/users.csv) is: 2, error lines: 0 2021/05/11 21:54:44 [ERROR] handler.go:63: Client 2 fail to execute: INSERT VERTEX user(created_at,id,screen_name,followers_count,friends_count) VALUES 1267831404525690880: (2020-06-02T14:52:27,1267831404525690880,"MarcelloLyotard",,);, ErrMsg: SyntaxError: syntax error near T14', ErrCode: -7 2021/05/11 21:54:44 [ERROR] handler.go:63: Client 1 fail to execute: INSERT VERTEXuser(created_at,id,screen_name,followers_count,friends_count) VALUES 372861228: (2011-09-13T15:13:20,372861228,"danielenavone1",,);, ErrMsg: SyntaxError: syntax error nearT15', ErrCode: -7 2021/05/11 21:54:44 [INFO] statsmgr.go:61: Done(/home/justin/Desktop/progetto_dm/nebula-docker-compose/users.csv): Time(1.03s), Finished(2), Failed(2), Latency AVG(0us), Batches Req AVG(0us), Rows AVG(1.95/s) 2021/05/11 21:54:44 Total 2 lines fail to insert into nebula graph database 2021/05/11 21:54:45 --- END OF NEBULA IMPORTER ---

In both cases a Syntax error occurs due to the "T" letter, however, even removing it the error still persists. As you can see, in the INSERT VERTEX statement quotes at the beginning and at the end of the string get removed. So my question is: How should I properly format the timestamp field in the csv to make the import process work?

Thanks for your help.

wey-gu commented 3 years ago

Dear @JArma19 ,

Thanks for using Nebula :). Your timestamp data looks good, it should be expected with the T already.

yield timestamp("2020-06-02T14:52:27")
+----------------------------------+
| timestamp("2020-06-02T14:52:27") |
+----------------------------------+
| 1591109547                       |
+----------------------------------+

Would you mind sharing your yaml file together with part of the CSV file(with the error lines) that we could reproduce?

Many thanks! B.R. Wey

JArma19 commented 3 years ago

Dear @wey-gu thanks for your answer. Here attached the yaml config file (that I had to convert to .txt since github doesn't allow .yml for uploading) and the CSV. config.txt users.csv Further more, for reproducing purpose, here is the space I have created CREATE SPACE twitter(partition_num=10, replica_factor=1, vid_type = INT64);

wey-gu commented 3 years ago

Dear @JArma19 Thanks, I can reproduce your issue now. Will come back soon.

wey-gu commented 3 years ago

After reaching out to @jievince, it's found it turned out that the importer didn't handle string typed timestamp properly now, that is to put string format timestamp (i.e. "2020-06-02T14:52:27") with the ngql function timestamp() ref 0. It's not automatically done by the importer.

(user@nebula) [twitter]> INSERT VERTEX user(created_at,id,screen_name,followers_count,friends_count) VALUES 1267831404525690880: (timestamp('2020-06-02T14:52:27'),1267831404525690880,"MarcelloLyotard",0,0)
Execution succeeded (time spent 1719/2326 us)

Mitigation before fixing it from the importer

For now, before we improved the importer here, you can a. to preprocess the data as below to convert it into int format; b. to add timestamp function in that clumn. We are sorry for the inconvenience

a

"0",":User",573206400,"372861228","danielenavone1",0,0,
"2",":User",573206400,"1267831404525690880","MarcelloLyotard",0,0,
"3",":User",573206400,"1092872164364177408","AmadeoGiulia",0,0,
"6",":User",573206400,"3058264727","rspinazze",0,0,
"9",":User",573206400,"2216480026","butera_linda",0,0,
"12",":User",573206400,"540066448","mbbelluco",0,0,
"14",":User",573206400,"105892539","AnesteTista",0,0,
"15",":User",573206400,"1057735056175841281","GastoneSabba",0,0,
"17",":User",573206400,"33284443","25O319",0,0,
"18",":User",573206400,"435737042","ppaul_64",0,0,
"248",":User",573206400,"1278027210138451968","gancillo1",0,0,

b

note: In this case it has to be '2011-09-13T15:13:20' instead of "2011-09-13T15:13:20"

"0",":User",timestamp('2011-09-13T15:13:20'),"372861228","danielenavone1",0,0,
"2",":User",timestamp('2020-06-02T14:52:27'),"1267831404525690880","MarcelloLyotard",0,0,
"3",":User",timestamp('2019-02-05T19:46:43'),"1092872164364177408","AmadeoGiulia",0,0,
"6",":User",timestamp('2015-02-24T06:41:43'),"3058264727","rspinazze",0,0,
"9",":User",timestamp('2013-12-09T21:42:01'),"2216480026","butera_linda",0,0,
"12",":User",timestamp('2012-03-29T15:34:53'),"540066448","mbbelluco",0,0,
"14",":User",timestamp('2010-01-17T21:08:48'),"105892539","AnesteTista",0,0,
"15",":User",timestamp('2018-10-31T20:44:23'),"1057735056175841281","GastoneSabba",0,0,
"17",":User",timestamp('2009-04-19T19:15:32'),"33284443","25O319",0,0,
"18",":User",timestamp('2011-12-13T11:41:13'),"435737042","ppaul_64",0,0,
"248",":User",timestamp('2020-06-30T18:08:15'),"1278027210138451968","gancillo1",0,0,
JArma19 commented 3 years ago

Dear @wey-gu thank you a lot for your help, I appreciate that. Just for curiosity purpose, how did you convert date into int format?

wey-gu commented 3 years ago

Dear @wey-gu thank you a lot for your help, I appreciate that. Just for curiosity purpose, how did you convert date into int format?

Dear @JArma19 ,

That could be done by some lib from almost any language, i.e. with python as below :-).

Also, we could also use option b as above to avoid this converting.

(user@nebula) [(none)]> yield timestamp('2011-09-13T15:13:20');
+----------------------------------+
| timestamp("2011-09-13T15:13:20") |
+----------------------------------+
| 1315926800                       |
+----------------------------------+
Got 1 rows (time spent 1400/2172 us)
❯ ipython
Python 3.9.4 (default, Apr  4 2021, 17:42:23)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import dateutil.parser as dp

In [2]: timestamp = "2011-09-13T15:13:20"

In [3]: dp.parse(timestamp).strftime('%s')
Out[3]: '1315898000'