m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Add parser for scamper1 datatype #1021

Closed cristinaleonr closed 2 years ago

cristinaleonr commented 3 years ago

Scamper1 parsing output

{
  "ID": "ndt-vfhzg_1630559948_0000000000024BD6",
  "Parser": {
    "Version": "local development",
    "Time": "2021-09-15T02:02:53.944790299Z",
    "ArchiveURL": "gs://archive-mlab-staging/ndt/scamper1/2021/09/06/20210906T003000.489493Z-scamper1-mlab4-bru01-ndt.tgz",
    "Filename": "2021/09/06/20210906T000000Z_ndt-vfhzg_1630559948_0000000000024BD6.jsonl",
    "Priority": 0,
    "GitCommit": "uninitialized"
  },
  "Date": "2021-09-06",
  "Raw": {
    "Metadata": {
      "UUID": "ndt-vfhzg_1630559948_0000000000024BD6",
      "TracerouteCallerVersion": "d84ea3d",
      "CachedResult": true,
      "CachedUUID": "ndt-vfhzg_1630559948_0000000000024B27"
    },
    "CycleStart": {
      "type": "cycle-start",
      "list_name": "default",
      "id": 0,
      "hostname": "ndt-vfhzg",
      "start_time": 1630885980
    },
    "Tracelb": {
      "Type": "tracelb",
      "Version": "0.1",
      "Userid": 0,
      "Method": "icmp-echo",
      "Src": "195.89.146.178",
      "Dst": "34.139.254.27",
      "Start": {
        "sec": 1630885980,
        "usec": 569320
      },
      "probe_size": 44,
      "Firsthop": 1,
      "Attempts": 3,
      "Confidence": 95,
      "Tos": 0,
      "Gaplimit": 0,
      "wait_timeout": 5,
      "wait_probe": 150,
      "Probec": 133,
      "probec_max": 3000,
      "Nodec": 11,
      "Linkc": 11,
      "Nodes": [
        {
          "hop_id": "20210905_ndt-vfhzg_195.89.146.129",
          "Addr": "195.89.146.129",
          "Name": "xe-11-0-1-xcr1.brx.cw.net",
          "q_ttl": 1,
          "Linkc": 1,
          "Links": [
            {
              "Links": [
                {
                  "addr": "195.2.16.254",
                  "probes": [
                    {
                      "tx": {
                        "sec": 1630885980,
                        "usec": 719881
                      },
                      "replyc": 1,
                      "ttl": 2,
                      "attempt": 0,
                      "flowid": 1,
                      "replies": [
                        {
                          "rx": {
                            "sec": 1630885980,
                            "usec": 723936
                          },
                          "ttl": 253,
                          "rtt": 4.055,
                          "icmp_type": 11,
                          "icmp_code": 0,
                          "icmp_q_tos": 0,
                          "icmp_q_ttl": 1
                        }
                      ]
                    },
                  ]
                }
              ]
            }
          ]
        },
      ]
    },
    "CycleStop": {
      "type": "cycle-stop",
      "list_name": "default",
      "id": 0,
      "hostname": "ndt-vfhzg",
      "stop_time": 1630886350
    }
  }
}

This change is Reviewable

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 6685


Changes Missing Coverage Covered Lines Changed/Added Lines %
parser/parser.go 0 2 0.0%
parser/scamper1.go 105 123 85.37%
<!-- Total: 105 125 84.0% -->
Totals Coverage Status
Change from base Build 6683: 0.4%
Covered Lines: 3760
Relevant Lines: 5899

💛 - Coveralls
SaiedKazemi commented 3 years ago

parser/scamper1.go, line 1 at r2 (raw file):

package parser

Although this is not directly related to this PR, I noticed that package comment lines in {disco,parser,pt}.go files need rewording. Specifically, they still refer to Paris Traceroute but not to scamper.

SaiedKazemi commented 3 years ago

parser/scamper1.go, line 19 at r2 (raw file):


//=====================================================================================
//                       Scamper1 Parser

Should this be scamper1 (i.e., lower-case s)?

SaiedKazemi commented 3 years ago

parser/scamper1.go, line 106 at r2 (raw file):

}

// ParseAndInsert decodes the Scamper1 data and inserts it into BQ.

Should this be scamper1?

SaiedKazemi commented 3 years ago

parser/scamper1.go, line 150 at r2 (raw file):


// NB: These functions are also required to complete the etl.Parser interface
// For Scamper1, we just forward the calls to the Inserter.

Should this be scamper1?

SaiedKazemi commented 3 years ago

parser/scamper1.go, line 113 at r2 (raw file):

  scamperOutput, err := parser.ParseTraceroute(rawContent)
  if err != nil {
      return fmt.Errorf("corrupted scamper1 file: %s", err)

I suggest that error messages here and elsewhere follow a consistent pattern of "failed to ..." so that they are a lot easier to be parsed in the logs. This one, for example, could say "failed to parse scamper1 file: %v". (You can look at traceroute-caller error messages to see other examples.)

SaiedKazemi commented 3 years ago

parser/scamper1.go, line 152 at r2 (raw file):

// For Scamper1, we just forward the calls to the Inserter.

func (p *Scamper1Parser) Flush() error {

Exported method Scamper1Parser.Flush should have comment or be unexported.

SaiedKazemi commented 3 years ago

parser/scamper1.go, line 156 at r2 (raw file):

}

func (p *Scamper1Parser) TableName() string {

Same as above.

SaiedKazemi commented 3 years ago

parser/scamper1.go, line 160 at r2 (raw file):

}

func (p *Scamper1Parser) FullTableName() string {

Same as above.

SaiedKazemi commented 3 years ago

Package testing fails for me:

(sandbox-cristinaleon-scamper1-parser)$ go test ./parser 2021/09/29 14:12:30 Prometheus metrics in tcp-info.metrics are registered. 2021/09/29 14:12:30 WARNING: failed to read toplevel.yaml 2021/09/29 14:12:30 WARNING: no file for schema field description: TCPRow.yaml 2021/09/29 14:12:30 pipe.go:715: # Script: unpacking testdata files 2021/09/29 14:12:30 pipe.go:522: tar -C testdata -xvf testdata/pt-files.tar.gz 2021/09/29 14:12:30 pipe.go:522: tar -C testdata -xvf testdata/web100-files.tar.gz 2021/09/29 14:12:30 pipe.go:522: tar -C testdata -xvf testdata/sidestream-files.tar.gz 2021/09/29 14:12:30 row.go:247: empty client annotation response 2021/09/29 14:12:30 row.go:325: annotation: Annotation error 2021/09/29 14:12:30 annotation.go:94: invalid character '/' after object key:value pair 2021/09/29 14:12:30 base.go:41: *parser_test.BadRow not Annotatable 2021/09/29 14:12:31 ndt.go:167: Warning: high row commit errors: 0 / 2 2021/09/29 14:12:31 globals.go:194: Unable to extract IATA code from 2021/09/29 14:12:31 globals.go:194: Unable to extract IATA code from 2021/09/29 14:12:31 globals.go:194: Unable to extract IATA code from testdata/PT/20171208T00:00:04Z-35.188.101.1-40784-173.205.3.38-9090.paris 2021/09/29 14:12:31 globals.go:194: Unable to extract IATA code from testdata/PT/20171208T00:00:04Z-37.220.21.130-5667-173.205.3.43-42487.paris 2021/09/29 14:12:31 globals.go:194: Unable to extract IATA code from testdata/PT/20171208T00:00:14Z-139.60.160.135-2023-173.205.3.44-1101.paris 2021/09/29 14:12:31 globals.go:194: Unable to extract IATA code from testdata/PT/20171208T00:00:14Z-76.227.226.149-37156-173.205.3.37-52156.paris 2021/09/29 14:12:31 globals.go:194: Unable to extract IATA code from testdata/PT/20171208T22:03:54Z-104.198.139.160-60574-163.22.28.37-7999.paris 2021/09/29 14:12:31 globals.go:194: Unable to extract IATA code from testdata/PT/20171208T22:03:59Z-139.60.160.135-1519-163.22.28.44-1101.paris cannot load test data --- FAIL: TestScamper1Parser_ParseAndInsert (0.00s) scamper1_test.go:48: Scamper1Parser.ParseAndInsert(): different rows - Raw.Tracelb.Nodes.slice[0].HopID: 20190824_ndt-plh7v_2001:550:1b01:1::1 != 20190825_ndt-plh7v_2001:550:1b01:1::1 Raw.Tracelb.Nodes.slice[1].HopID: 20190824_ndt-plh7v_2001:4888:3f:6092:3a2:26:0:1 != 20190825_ndt-plh7v_2001:4888:3f:6092:3a2:26:0:1 2021/09/29 14:12:31 task.go:143: WARNING empty test test:test 2021/09/29 14:12:31 task.go:179: Processed 364 files, 0 nil data, 362 rows committed, 0 failed, from gs://fake-archive/ndt/tcpinfo/2019/05/16/20190516T013026.744845Z-tcpinfo-mlab4-arn02-ndt.tgz into test_suffix 2021/09/29 14:12:31 task.go:179: Processed 364 files, 0 nil data, 362 rows committed, 0 failed, from gs://fake-archive/ndt/tcpinfo/2019/05/16/20190516T013026.744845Z-tcpinfo-mlab4-arn02-ndt.tgz into test_suffix 2021/09/29 14:12:31 task.go:179: Processed 364 files, 0 nil data, 362 rows committed, 0 failed, from gs://fake-archive/ndt/tcpinfo/2019/05/16/20190516T013026.744845Z-tcpinfo-mlab4-arn02-ndt.tgz into test_suffix FAIL FAIL github.com/m-lab/etl/parser 0.629s FAIL

SaiedKazemi commented 2 years ago

Thanks for the suggestion, but let me look into this issue today and share my thoughts with you. We need a robust solution that won't have to change going forward.