jpillora / csv-to-influxdb

Import CSV files into InfluxDB
80 stars 37 forks source link

[Question] Line not imported #17

Open okomarov opened 7 years ago

okomarov commented 7 years ago

I have the following testdata.csv:

Time,Price,Volume,Exchange,G127,Correction,Cond1,Cond2,Symbol
"2000-09-19 09:30:11",20.3,100,"T",0,0," "," ","GENE"
"2000-09-19 09:30:11",20.5,200,"T",0,0," "," ","GENE"
"2000-09-19 09:30:18",20.53125,300,"T",0,0," "," ","GENE"

I execute:

>csv-to-influxdb_windows_amd64.exe -ts "Time" -d mydb testdata.csv
2017/02/08 10:47:34 Done (wrote 3 points)

But then the CLI shows only the second and third row:

> select * from data
name: data
time               Cond1 Cond2 Correction Exchange G127 Price    Symbol Volume
----               ----- ----- ---------- -------- ---- -----    ------ ------
969355811000000000             0          true     0    20.5     GENE   200
969355818000000000             0          true     0    20.53125 GENE   300
ghost commented 7 years ago

Problem is because you have two rows with same timestamp. What happens is that the first row is overwritten by second row. I had the same problem and i solved it by editing script so that it adds 1 nanosecond to every timestamp. See image below.

capture

ghost commented 7 years ago

Someone asked how I've implemented it. Look for comment //modified in below code.

At beginning:

var VERSION = "0.0.0-src"
var nanosecond = time.Nanosecond //modified

Later:

//fields require string parsing
if timestampRe.MatchString(r) {
    t, err := time.Parse(conf.TimestampFormat, r)
    t = t.Add(nanosecond) //modified
    nanosecond += time.Nanosecond   //modified
    if err != nil {
        fmt.Printf("#%d: %s: Invalid time: %s\n", i, h, err)
        continue
    }
    if conf.TimestampColumn == h {
        ts = t //the timestamp column!
        continue
    }
    fields[h] = t
}

If someone needs to skip a row like i had to, below is the code you can add to skip for instance second row:

//read csv, line by line
    r := csv.NewReader(f)
    for i := 0; ; i++ {
        records, err := r.Read()
        if err != nil {
            if err == io.EOF {
                break
            }
            log.Fatalf("CSV error: %s", err)
        }
        if i == 0 {
            setHeaders(records)
            continue
        }
        if i == 1 { //modified, skips second row
            continue
        }

How to install Go: http://ask.xmodulo.com/install-go-language-linux.html How to compile Go: https://gobyexample.com/hello-world

jpillora commented 7 years ago

@edin5 is correct, since InfluxDB uses the timestamp as the primary key, you'll have to either modify your data or patch this program. Actually, maybe the solution to this is an "auto increment equal timestamps" option (--inc-timestamps) with it off by default...

EEShaffer commented 7 years ago

When I downloaded the tool I used the curl command, which created an executable in my directory. I cannot edit the file with vi so I can't apply the fix. I tried downloading and editing the code but could not find the above mentioned text in the file.

EEShaffer commented 7 years ago

I'm not sure the above solution will work for me. I have 15360 points of data as shown in the below sample. It only loaded 11 rows.

2015-08-18 00:00:00.000, 8.12, 0.039, 'Coyote Creek', NULL 2015-08-18 00:06:00.000, 8.005, 0.039, 'Coyote Creek', NULL 2015-08-18 00:12:00.000, 7.887, 0.036, 'Coyote Creek', NULL 2015-08-18 00:18:00.000, 7.762, 0.043, 'Coyote Creek', NULL 2015-08-18 00:24:00.000, 7.635, 0.052, 'Coyote Creek', NULL 2015-08-18 00:30:00.000, 7.5, 0.039, 'Coyote Creek', NULL 2015-08-18 00:36:00.000, 7.372, 0.043, 'Coyote Creek', NULL 2015-08-18 00:42:00.000, 7.234, 0.046, 'Coyote Creek', NULL 2015-08-18 00:48:00.000, 7.11, 0.043, 'Coyote Creek', NULL 2015-08-18 00:54:00.000, 6.982, 0.046, 'Coyote Creek', NULL 2015-08-18 01:00:00.000, 6.837, 0.046, 'Coyote Creek', NULL 2015-08-18 01:06:00.000, 6.713, 0.046, 'Coyote Creek', NULL 2015-08-18 01:12:00.000, 6.578, 0.046, 'Coyote Creek', NULL 2015-08-18 01:18:00.000, 6.44, 0.046, 'Coyote Creek', NULL 2015-08-18 01:24:00.000, 6.299, 0.046, 'Coyote Creek', NULL 2015-08-18 01:30:00.000, 6.168, 0.046, 'Coyote Creek', NULL 2015-08-18 01:36:00.000, 6.024, 0.046, 'Coyote Creek', NULL 2015-08-18 01:42:00.000, 5.879, 0.049, 'Coyote Creek', NULL 2015-08-18 01:48:00.000, 5.745, 0.046, 'Coyote Creek', NULL 2015-08-18 01:54:00.000, 5.617, 0.043, 'Coyote Creek', NULL 2015-08-18 02:00:00.000, 5.472, 0.046, 'Coyote Creek', NULL 2015-08-18 02:06:00.000, 5.348, 0.046, 'Coyote Creek', NULL

syfantid commented 6 years ago

@jpillora I am currently working on a project in InfluxDB and the --inc-timestamps option would suit me perfectly. Is there any chance it will be included soon? Because I downloaded the Windows executable from the releases and thus I cannot edit the source code.

PS.: The primary key in InfluxDB is not the timestamp itself, rather than a combination of timestamp and tags.