oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.38k stars 754 forks source link

REST API /search return result with HTML tags and formating #2612

Open wy193777 opened 5 years ago

wy193777 commented 5 years ago

The /search REST API return results with unnecessary HTML tags and formatting. Is there a way to turn off HTML formatting on search results from REST API?

vladak commented 5 years ago

Could you give an example ?

wy193777 commented 5 years ago

Below is an example from a instance run locally. You can see a lot of <b> been added to wrap the search term. Logically, the client using the REST API should already know the searched term, so strong them isn't necessary. You can also see & been substituted to &amp;.

{
  "time": 68,
  "resultCount": 4,
  "startDocument": 0,
  "endDocument": 3,
  "results": {
    "/Golden-Register-2.0-Backend/package.json": [
      {
        "line": "    \"start\": \"copyfiles sql/**/*.sql build/ &amp;&amp; tsc &amp;&amp; node --<b>max-old-space-size</b>=5120 --trace-warnings -r ts-node/register build/src/index.js --pretty\",",
        "lineNumber": "13"
      }
    ],
    "/diagram-visualization/node_modules/webpack/package.json": [
      {
        "line": "    \"appveyor:test\": \"node node_modules\\\\mocha\\\\bin\\\\mocha --<b>max-old-space-size</b>=4096 --harmony test/*.test.js\",",
        "lineNumber": "129"
      },
      {
        "line": "    \"benchmark\": \"mocha --<b>max-old-space-size</b>=4096 --harmony test/*.benchmark.js -R spec\",",
        "lineNumber": "131"
      },
      {
        "line": "    \"circleci:test\": \"node node_modules/mocha/bin/mocha --<b>max-old-space-size</b>=4096 --harmony test/*.test.js\",",
        "lineNumber": "134"
      },
      {
        "line": "    \"cover\": \"node --<b>max-old-space-size</b>=4096 --harmony ./node_modules/istanbul/lib/cli.js cover -x '**/*.runtime.js' node_modules/mocha/bin/_mocha -- test/*.test.js\",",
        "lineNumber": "135"
      },
      {
        "line": "    \"cover:min\": \"node --<b>max-old-space-size</b>=4096 --harmony ./node_modules/istanbul/lib/cli.js cover -x '**/*.runtime.js' --report lcovonly node_modules/mocha/bin/_mocha -- test/*.test.js\",",
        "lineNumber": "136"
      },
      {
        "line": "    \"test\": \"mocha test/*.test.js --<b>max-old-space-size</b>=4096 --harmony --check-leaks\",",
        "lineNumber": "143"
      }
    ],
    "/Patching-Tool-Client/node_modules/webpack/package.json": [
      {
        "line": "    \"benchmark\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.benchmark.js\\\" --runInBand\",",
        "lineNumber": "200"
      },
      {
        "line": "    \"cover:all\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --coverage\",",
        "lineNumber": "204"
      },
      {
        "line": "    \"cover:integration\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.test.js\\\" --coverage\",",
        "lineNumber": "206"
      },
      {
        "line": "    \"cover:unit\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.unittest.js\\\" --coverage\",",
        "lineNumber": "208"
      },
      {
        "line": "    \"schema-lint\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.lint.js\\\" --no-verbose\",",
        "lineNumber": "214"
      },
      {
        "line": "    \"test\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest\",",
        "lineNumber": "218"
      },
      {
        "line": "    \"test:basic\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/{
          TestCasesNormal,
          StatsTestCases,
          ConfigTestCases
        }.test.js\\\"\",",
        "lineNumber": "219"
      },
      {
        "line": "    \"test:integration\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.test.js\\\"\",",
        "lineNumber": "220"
      },
      {
        "line": "    \"test:unit\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.unittest.js\\\"\",",
        "lineNumber": "221"
      }
    ],
    "/Golden-Register-2.0-Frontend/node_modules/webpack/package.json": [
      {
        "line": "    \"benchmark\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.benchmark.js\\\" --runInBand\",",
        "lineNumber": "244"
      },
      {
        "line": "    \"cover:all\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --coverage\",",
        "lineNumber": "248"
      },
      {
        "line": "    \"cover:integration\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.test.js\\\" --coverage\",",
        "lineNumber": "250"
      },
      {
        "line": "    \"cover:unit\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.unittest.js\\\" --coverage\",",
        "lineNumber": "252"
      },
      {
        "line": "    \"schema-lint\": \"node --<b>max-old-space-size</b>=4096 node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.lint.js\\\" --no-verbose\",",
        "lineNumber": "258"
      },
      {
        "line": "    \"test\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest\",",
        "lineNumber": "260"
      },
      {
        "line": "    \"test:basic\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/{
          TestCasesNormal,
          StatsTestCases,
          ConfigTestCases
        }.test.js\\\"\",",
        "lineNumber": "261"
      },
      {
        "line": "    \"test:integration\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.test.js\\\"\",",
        "lineNumber": "262"
      },
      {
        "line": "    \"test:unit\": \"node --<b>max-old-space-size</b>=4096 --trace-deprecation node_modules/jest-cli/bin/jest --testMatch \\\"&lt;rootDir&gt;/test/*.unittest.js\\\"\",",
        "lineNumber": "263"
      }
    ]
  }
}

The OpenGrok's REST API doc also show this behavior.

{
  "time": 13,
  "resultCount": 35,
  "startDocument": 0,
  "endDocument": 0,
  "results": {
    "/opengrok/test/org/opensolaris/opengrok/history/hg-export-renamed.txt": [{
      "line": "# User Vladimir <b>Kotal</b> &lt;Vladimir.<b>Kotal</b>@oracle.com&gt;",
      "lineNumber": "19"
    },{
      "line": "# User Vladimir <b>Kotal</b> &lt;Vladimir.<b>Kotal</b>@oracle.com&gt;",
      "lineNumber":"29"
    }]
  }
vladak commented 5 years ago

It seems that the summarizer/highlighter kicks in.

vladak commented 5 years ago

Yes, SearchController uses SearchEngine.results() which uses Summarizer and Summary used therein is inherently HTML based. There needs to be a special version of SearchEngine.results() for the API.

wy193777 commented 5 years ago

Thanks for your quick help!

wy193777 commented 5 years ago

Have idea on when will this bug been fixed?

vladak commented 5 years ago

Not a priority, at least for me. Pull requests are welcome of course.

pá 11. 1. 2019 23:57 odesílatel Shenghan Gao notifications@github.com napsal:

Have idea on when will this bug been fixed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oracle/opengrok/issues/2612#issuecomment-453683616, or mute the thread https://github.com/notifications/unsubscribe-auth/ACzGDOhqD2nlackS_SLGQV16dsCMm2cSks5vCRbvgaJpZM4ZsfzS .

wy193777 commented 5 years ago

The <b> actually not just comes from Summarizer. After dig into source code, I find those html tags might also comes from Context.java, which is extremely complex.....

wy193777 commented 5 years ago

OK, Context.java calls code from PlainLineTokenizer.lex, which is generated from opengrok-indexer/src/main/resources/search/context/PlainLineTokenizer.lex. This lex file seems also did things like htmlize. I guess eliminate <b> is not a simple task.

vladak commented 5 years ago

Yes, seems like some serious refactoring is needed.

Dne st 20. 3. 2019 1:20 uživatel Shenghan Gao notifications@github.com napsal:

OK, Context.java calls code from PlainLineTokenizer.lex, which is generated from opengrok-indexer/src/main/resources/search/context/PlainLineTokenizer.lex. This lex file seems also did things like htmlize. I guess eliminate is not a simple task.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oracle/opengrok/issues/2612#issuecomment-474634688, or mute the thread https://github.com/notifications/unsubscribe-auth/ACzGDGGrMh50N6fbWcz0Ut8V9O4kop91ks5vYX69gaJpZM4ZsfzS .

idodeclare commented 5 years ago

It's fairly straight-forward to extend OGKUnifiedHighlighter. I raised PR #2732.