stoplightio / prism

Turn any OpenAPI2/3 and Postman Collection file into an API server with mocking, transformations and validations.
https://stoplight.io/open-source/prism
Apache License 2.0
4.28k stars 343 forks source link

Loading time can take up to one hour #1479

Closed thinkingserious closed 3 years ago

thinkingserious commented 4 years ago

Describe the bug

Running this file like so: prism mock -d -p 4099 twilio_api.json takes about an hour to load.

To Reproduce

I have truncated the file to reproduce the issue quickly; however, the same behavior holds true with the full file.

  1. Given this OpenAPI document
{
  "info": {
    "title": "Twilio - Api",
    "version": "1.0.1"
  },
  "openapi": "3.0.1",
  "paths": {
    "/2010-04-01/Accounts/XXXXXX/Calls.json": {
      "get": {
        "parameters": [
          {
            "in": "query",
            "name": "ParentCallSid",
            "schema": {
              "maxLength": 34,
              "minLength": 34,
              "pattern": "^CA[0-9a-fA-F]{32}$",
              "type": "string"
            }
          },
          {
            "in": "query",
            "name": "StartTime",
            "schema": {
              "format": "date-time",
              "type": "string"
            }
          },
          {
            "in": "query",
            "name": "StartTime<",
            "schema": {
              "format": "date-time",
              "type": "string"
            }
          },
          {
            "in": "query",
            "name": "StartTime>",
            "schema": {
              "format": "date-time",
              "type": "string"
            }
          }
        ]
      }
    }
  }
}
  1. Run this CLI command prism mock -d -p 4099 twilio_api.json
  2. It will take several minutes to run (it appears to be frozen, but will eventually load -- it takes about an hour to load using the full file).

Expected behavior

The sample truncated spec above loads almost immediately if you do any of the following:

  1. Remove the following property:
{
  "in": "query",
  "name": "ParentCallSid",
  "schema": {
    "maxLength": 34,
    "minLength": 34,
    "pattern": "^CA[0-9a-fA-F]{32}$",
    "type": "string"
  }
}
  1. Change the following properties from:
{
  "in": "query",
  "name": "StartTime<",
  "schema": {
    "format": "date-time",
    "type": "string"
  }
},
{
  "in": "query",
  "name": "StartTime>",
  "schema": {
    "format": "date-time",
    "type": "string"
  }
}

to

{
  "in": "query",
  "name": "StartTimeBefore",
  "schema": {
    "format": "date-time",
    "type": "string"
  }
},
{
  "in": "query",
  "name": "StartTimeAfter",
  "schema": {
    "format": "date-time",
    "type": "string"
  }
}

Additional context

Note that the above behavior is also present when you load the entire file; however, I've removed as much from the spec as possible as to reproduce the issue without the need to deal with possible loading issues with the very long twilio_api.json file.

There is another path in the twilio_api.json file that is very similar and has no problem loading: https://github.com/twilio/twilio-oai/blob/main/spec/json/twilio_api.json#L12628-L12645 https://github.com/twilio/twilio-oai/blob/main/spec/json/twilio_api.json#L12655-L12672

Environment:

XVincentX commented 3 years ago

@thinkingserious Thanks for the message.

I've run both the reduced example AND the full file you've provided and then analysing the flame graph and interestingly speaking, the problem is in the example generation for the CLI path printing:

image

In particular, the regex matching that our dependency is using seems to be taking a significant amount of time when trying to match on your paths.

image

We'll take a further look into, although I am concerned that there's not going to be that much we can do unless the regex can be optimised.

Meanwhile, although it's not optimal, you could fork the repository and change this line to return an empty string. It will speed up the process dramatically.

JackPott commented 3 years ago

I've just encountered the same issues on the same file, congratulations @thinkingserious on narrowing it down to a single statement. I can't add much to the original statement, but I do note that removing just the schema declaration at line 10643 (for ParentCallSid query param) lets prism start up quickly. Removing min or max length, or the regex from within it makes no difference, you have to remove the whole schema tag.

This exact schema is used 17 times in the file, but doesn't pose a problem anywhere else. It is only one of two places that its used in a query param rather than a path spec. The other place is right at the bottom on line 19138, again this doesn't seem to cause an issue.

Turning it from a query param to a path param lets prism start up quickly, so it seems to be localised to it being a query param.

# Modified, starts up quickly 
GET        http://0.0.0.0:4010/2010-04-01/Accounts/ACBDDC9A5BE72868529ead38b6316Cc89c/CA767df8BDa53048186Ab9a3ebF035791f/Calls.json{?To,From,Status,StartTime,StartTime<,StartTime>,EndTime,EndTime<,EndTime>,PageSize}` 

Further experiments show that renaming the ParentCallSid field to a causes a brief stutter but allows a normal start up:

GET        http://0.0.0.0:4010/2010-04-01/Accounts/ACa4fFc039AC5Bd9C5Ca62FC1CDFa2acC8/Calls.json{?To,From,a,Status,StartTime,StartTime<,StartTime>,EndTime,EndTime<,EndTime>,PageSize}

I pursued this further and found that each character added to that name of the query string parameter formerly known as ParentCallSid doubled the start up time. These numbers weren't measured scientifically but on my setup:

Parameter name Appx pause in start up
"name": "abcdefg" 56 seconds
"name": "abcdef" 28 seconds
"name": "abcde" 14 seconds
"name": "abcd" 7 seconds
"name": "abc" ~4 seconds

Thats...weird but quite reproducible. I can't think why anything would be looping over the characters of the name field?

Can also confirm removing the < and > from L10681, 10690, 10708 & 10719 does fix the issue, but again- those characters are used in the L19138 "Recordings" section without trouble.

Hope this helps pin down the issue!

XVincentX commented 3 years ago

Hey @JackPott,

great analysis here! I'm confident this is going to be helpful.

While I haven't really been able to look into this again, I do remember the "problem" is that the regex computation is slow, and that is something I really have no control over. (Again, I am assuming my recollections here are accurate)

Currently the plan I have in mind is to try to reproduce the issue with the specified regex and see if the RegEx NodeJs object is "suffering" from this.

According to the result, I am curious to see if Node 14 or 15 suffer from the same issue (Prism officially runs on Node 12) and if so — open an issue in the NodeJS Repository and go from there.

XVincentX commented 3 years ago

Consider this reproduction code:

const template = '/2010-04-01/Accounts/XXXXXX/Calls.json{?ParentCallSid,StartTime,StartTime<,StartTime>}';
const expandRe = /\{([#&+.\/;?]?)((?:[-\w%.]+(\*|:\d+)?,?)+)\}/g;

template.replace(expandRe, console.log);

Some interesting findings:

  1. Both Node 12 and Node 14 suffer of the same bug
  2. Running it from Safari is almost instantaneous
  3. Chromium is suffering from the same bug

If you try the same regex on the same string on regex101 on various engines:

  1. PHP: returns catastrophic backtracking
  2. JavaScript: says it runs in 18ms in Safari, timeout in Chrome
  3. Python: returns catastrophic backtracking
  4. GoLang: says it runs in 25ms

This is a bug in V8: https://bugs.chromium.org/p/v8/issues/detail?id=10765

XVincentX commented 3 years ago

I am closing this. Upgrading to a newer runtime is the only way to fix the issue (node 15)

thinkingserious commented 3 years ago

@XVincentX, I can confirm that upgrading to a newer runtime (v15.8.0) does not solve the issue. Is it possible to use another engine?

XVincentX commented 3 years ago

@thinkingserious I do not work for Stoplight anymore — I am not the best person to ask the question at this point.

However, I thought that Node 15 would have the updated engine. Maybe the maintainers might have changed idea for some reason; I guess you're gonna have to wait Node 16 and try it again.

thinkingserious commented 3 years ago

Thanks for the quick response @XVincentX! I'll create a new issue here for further investigation.