Open lhsazevedo opened 1 day ago
Stream wrapper now (should) have the role="stream_wrapper"
attribute on the <refentry>
tag. So those should be easy to filter out.
I don't know why chapter/section are not considered part of an extension as this markup has existed for decades.
Same for the
Background
In https://github.com/php/phd/pull/154, we resolved the issue of missing pages in the search index. However, now that these pages are visible in search results, a long-standing bug in result grouping has become apparent.
Issue
Some search results are incorrectly categorized between the "Extensions" and "Other Matches" groups.
Example:
Query: security
As shown:
win32service
extension) is incorrectly placed in the "Other Matches" group.Cause
The client-side search code groups results based on types, including Function, Variable, Class, Exception, Extension, and Other Matches (general). These types are assigned according to the XML element tags in the manual's source.
Issue 1: Incorrect grouping in "Extensions"
The first issue occurs in this section of the code:
https://github.com/php/web-php/blob/27fbef13e912547b4086793a5dd2e04fc0fcf684/js/search.js#L130-L134
The code assumes that any entry with the element tag
<book>
,<set>
, or<reference>
is related to extensions, which is inaccurate. Many entries, though using these elements, do not belong to extensions.Example data:
Issue 2: Incorrect grouping in "Other Matches"
The second issue is due to an assumption in the following code:
https://github.com/php/web-php/blob/27fbef13e912547b4086793a5dd2e04fc0fcf684/js/search.js#L136-L141
The code assumes that entries with the tags
<section>
,<chapter>
,<appendix>
, or<article>
do not belong to an extension. While this is not as bad, there are many pages that are part of an extension but are currently placed in the "Other Matches" group:PHP Manual index dump
For convenience, here is the dump from the PHD SQLite index for the PHP Manual: php-manual-index_2024-10-08.sql.gz
Notes