stchang / parsack

A basic Parsec-like monadic parser combinator library implementation in Racket.
MIT License
50 stars 10 forks source link

example json parser is incredible slow #38

Closed jinwei233 closed 9 years ago

jinwei233 commented 10 years ago

a sample json file , about 250k , use Node only takes 2ms , but use parsack , it takes about 62502ms

stchang commented 10 years ago

Thanks for the report! Parsack should be slower than a finely tuned parser implementation but you're right that that seems excessive. Would you be willing to provide the sample so I can investigate?

jinwei233 commented 10 years ago

sorry for replying so late

use parsack/examples/json-parser.rkt to parse a generated JSON file online http://www.json-generator.com/

for node.js

var jsonString  = require("fs").readFileSync("your_test_json.txt").toString();
console.time("bench");
JSON.parse(jsonString);
console.timeEnd("bench");
greghendershott commented 10 years ago

Given this:

#lang racket

(require parsack
         parsack/examples/json-parser)
(time (void (parse $p_text (file->string "json.json"))))

(require json)
(time (void (with-input-from-file "json.json" read-json)))

I get:

cpu time: 118 real time: 122 gc time: 48
cpu time: 5 real time: 5 gc time: 0

Where the json.json file is below.

That is basically the difference I'd expect, as @stchang mentioned above.

Maybe before, when you got the extremely slow time of 62502m, it was on different JSON that hits some weak point in the parsack/examples/json-parser grammar??


[
  {
    "_id": "5400abaafd4d1699c159fe17",
    "index": 0,
    "guid": "48cec13f-1f73-4c7a-bad0-5d9094f8c438",
    "isActive": true,
    "balance": "$3,262.41",
    "picture": "http://placehold.it/32x32",
    "age": 34,
    "eyeColor": "blue",
    "name": "Tammi Henry",
    "gender": "female",
    "company": "UTARIAN",
    "email": "tammihenry@utarian.com",
    "phone": "+1 (909) 506-2308",
    "address": "643 Beach Place, Dubois, American Samoa, 2382",
    "about": "Duis reprehenderit magna ullamco ullamco proident culpa sit nulla labore et. Lorem amet id nulla nulla elit ullamco eiusmod quis eiusmod est reprehenderit dolor fugiat dolore. Culpa cillum ut consequat esse amet non mollit enim do non nostrud quis. Cillum eu culpa nulla aliqua non minim in.\r\n",
    "registered": "2014-03-06T23:30:02 +05:00",
    "latitude": 88.002578,
    "longitude": 151.119194,
    "tags": [
      "sunt",
      "nulla",
      "enim",
      "aute",
      "tempor",
      "occaecat",
      "ullamco"
    ],
    "friends": [
      {
        "id": 0,
        "name": "Erna Alston"
      },
      {
        "id": 1,
        "name": "Karla Burks"
      },
      {
        "id": 2,
        "name": "Lily Henson"
      }
    ],
    "greeting": "Hello, Tammi Henry! You have 8 unread messages.",
    "favoriteFruit": "banana"
  },
  {
    "_id": "5400abaaa134884ce3ae61bf",
    "index": 1,
    "guid": "9887a4e5-ae3b-4846-b34e-cfb70f3649fc",
    "isActive": true,
    "balance": "$3,078.49",
    "picture": "http://placehold.it/32x32",
    "age": 29,
    "eyeColor": "blue",
    "name": "Janelle Santiago",
    "gender": "female",
    "company": "MUSIX",
    "email": "janellesantiago@musix.com",
    "phone": "+1 (952) 503-3121",
    "address": "370 Harrison Place, Sardis, Illinois, 4989",
    "about": "Dolore consectetur sit officia ad dolore qui reprehenderit pariatur labore id sunt. Aliqua veniam ea velit est excepteur non in aliqua laborum consectetur dolor sunt dolor minim. Nulla ullamco aliquip deserunt quis consequat Lorem deserunt minim deserunt. Magna sit cillum laboris exercitation reprehenderit sint id sint ex cillum veniam. Dolore adipisicing ipsum anim nisi mollit aliquip labore fugiat voluptate tempor eiusmod sunt quis incididunt. Est est aliquip adipisicing eu minim cupidatat nostrud amet irure tempor.\r\n",
    "registered": "2014-08-02T12:56:46 +04:00",
    "latitude": 42.523601,
    "longitude": -116.932987,
    "tags": [
      "incididunt",
      "laboris",
      "quis",
      "ea",
      "quis",
      "aute",
      "sint"
    ],
    "friends": [
      {
        "id": 0,
        "name": "Ashley Mccray"
      },
      {
        "id": 1,
        "name": "Murray Shields"
      },
      {
        "id": 2,
        "name": "Florine Mullins"
      }
    ],
    "greeting": "Hello, Janelle Santiago! You have 8 unread messages.",
    "favoriteFruit": "banana"
  },
  {
    "_id": "5400abaa71791fb17ac94b3d",
    "index": 2,
    "guid": "fc68dfab-3efb-4dce-933c-eb48f8edca68",
    "isActive": true,
    "balance": "$2,990.41",
    "picture": "http://placehold.it/32x32",
    "age": 22,
    "eyeColor": "brown",
    "name": "Kelley Carey",
    "gender": "female",
    "company": "GRUPOLI",
    "email": "kelleycarey@grupoli.com",
    "phone": "+1 (854) 496-2290",
    "address": "680 Schroeders Avenue, Ernstville, Guam, 3891",
    "about": "Et in dolor magna Lorem. Est velit occaecat nulla est. Est et aute dolore occaecat exercitation irure proident ex laboris quis magna deserunt reprehenderit. Enim enim cupidatat eiusmod aliqua ex laboris quis pariatur et. Ad reprehenderit sit fugiat eu laboris sunt consequat excepteur commodo do aute. Incididunt ipsum incididunt incididunt culpa non sunt et est quis elit. Nulla anim duis dolor Lorem.\r\n",
    "registered": "2014-03-14T16:18:45 +04:00",
    "latitude": -58.483448,
    "longitude": 61.780195,
    "tags": [
      "consectetur",
      "dolor",
      "quis",
      "nisi",
      "nisi",
      "proident",
      "aliqua"
    ],
    "friends": [
      {
        "id": 0,
        "name": "Peck Beasley"
      },
      {
        "id": 1,
        "name": "Lee Soto"
      },
      {
        "id": 2,
        "name": "Rocha Sampson"
      }
    ],
    "greeting": "Hello, Kelley Carey! You have 6 unread messages.",
    "favoriteFruit": "strawberry"
  },
  {
    "_id": "5400abaa172959432281a62b",
    "index": 3,
    "guid": "5cb8eb0e-7f68-418b-9cde-1ebc54a92e97",
    "isActive": true,
    "balance": "$3,988.46",
    "picture": "http://placehold.it/32x32",
    "age": 23,
    "eyeColor": "green",
    "name": "Mckay Paul",
    "gender": "male",
    "company": "APPLICA",
    "email": "mckaypaul@applica.com",
    "phone": "+1 (839) 556-2991",
    "address": "392 Hopkins Street, Chesapeake, Missouri, 2375",
    "about": "Consequat occaecat deserunt nisi reprehenderit. Aliqua dolor do qui tempor nisi fugiat. Amet consequat eiusmod cillum culpa et consectetur. Nostrud magna nostrud sint aliqua duis. Quis quis anim aliquip dolore nisi esse ullamco sunt mollit culpa velit. Deserunt laboris adipisicing ad voluptate consequat adipisicing qui sit ullamco. In officia eu pariatur sit enim eu ut aliqua voluptate.\r\n",
    "registered": "2014-06-13T04:10:43 +04:00",
    "latitude": 38.538085,
    "longitude": 72.860373,
    "tags": [
      "do",
      "ut",
      "excepteur",
      "dolor",
      "non",
      "sunt",
      "minim"
    ],
    "friends": [
      {
        "id": 0,
        "name": "Velasquez Bryan"
      },
      {
        "id": 1,
        "name": "Perry Robinson"
      },
      {
        "id": 2,
        "name": "Hunter Blackwell"
      }
    ],
    "greeting": "Hello, Mckay Paul! You have 4 unread messages.",
    "favoriteFruit": "banana"
  },
  {
    "_id": "5400abaa64120ac2553aaa2b",
    "index": 4,
    "guid": "93a36788-7a6d-46ad-80a3-b2f56f0bac84",
    "isActive": false,
    "balance": "$2,286.43",
    "picture": "http://placehold.it/32x32",
    "age": 34,
    "eyeColor": "brown",
    "name": "Fran Owen",
    "gender": "female",
    "company": "CIRCUM",
    "email": "franowen@circum.com",
    "phone": "+1 (864) 512-3326",
    "address": "515 Colin Place, Sanders, Puerto Rico, 160",
    "about": "Fugiat pariatur deserunt ipsum dolor velit. Labore id ut id tempor ut. Adipisicing exercitation fugiat nisi incididunt id anim voluptate esse.\r\n",
    "registered": "2014-06-23T01:56:16 +04:00",
    "latitude": -13.762686,
    "longitude": 152.310941,
    "tags": [
      "irure",
      "ea",
      "irure",
      "sint",
      "tempor",
      "irure",
      "officia"
    ],
    "friends": [
      {
        "id": 0,
        "name": "Hull Pratt"
      },
      {
        "id": 1,
        "name": "Dominguez Middleton"
      },
      {
        "id": 2,
        "name": "Johnson Wilder"
      }
    ],
    "greeting": "Hello, Fran Owen! You have 5 unread messages.",
    "favoriteFruit": "strawberry"
  }
]
greghendershott commented 10 years ago

For measuring timing, it would be better if I'd used racket/base and collect-garbage -- for example:

#lang racket/base

(require parsack
         parsack/examples/json-parser
         racket/file)
(for ([i 3]) (collect-garbage))
(time (void (parse $p_text (file->string "json.json"))))

(require json)
(for ([i 3]) (collect-garbage))
(time (void (with-input-from-file "json.json" read-json)))

However in this case I get roughly the same result.

stchang commented 9 years ago

parsack's json parsing is now comparable to Racket. Running Greg's test:

$ racket json-perf-test.rkt 
parsack json parser:
cpu time: 4 real time: 8 gc time: 0
Racket read-json:
cpu time: 4 real time: 3 gc time: 0
jinwei233 commented 9 years ago

:+1: