readthedocs.org "Search all docs" endpoint forwards to incorrect link (.html missing)

tweep commented 5 years ago

Details

Hi there, I got some trouble regarding the Readthedocs search on your website.

Read the Docs project URL: N/A
Build URL (if applicable): 3.13.3 ( web version )
Read the Docs username (if applicable): N/A

How to reproduce

visit https://readthedocs.org/
scroll down to "Search all docs"
enter a search phrase in the box. I use : "RNA-Seq homepage displays"
click on "let's do this"
on result page, click on "Files"
click on a result.

Expected Result

I expect that I get forward to a URL which shows the result and has the suffix ".html" :

https://microscope.readthedocs.io/en/latest/content/transcriptomic/NGSProjectRNAseq.html?highlight=getting%20started

Actual Result

I get forwarded to an Error page saying "SORRY the page does not exist yet" , because the result url is missing the ".html" suffix

Incorrect: https://microscope.readthedocs.io/en/latest/content/transcriptomic/NGSProjectRNAseq?highlight=getting%20started

Correct: NGSProjectRNAseq.html?highlight=... Incorrect forward: NGSProjectRNAseq?highlight=...

stsewd commented 5 years ago

This also happens in the internal search https://readthedocs.org/projects/docs/search/?q=docs. I'm investigating. Probably related to some new changes in #5230

The search in the project itself isn't affected.

stsewd commented 5 years ago

The problem is here

https://github.com/rtfd/readthedocs.org/blob/0917fbd925998086c958f642d6564de88b142d1b/readthedocs/core/templatetags/core_tags.py#L39-L43

So, related to the resolver, but I think that was added before :/

stsewd commented 5 years ago

Ok, so I thought that we were using the path from imported file for search, but happens that we don't do that.

There we have the real path with the html extension. https://github.com/rtfd/readthedocs.org/blob/0917fbd925998086c958f642d6564de88b142d1b/readthedocs/projects/models.py#L1159-L1159

But int the further function, we pass the .fson path only.

https://github.com/rtfd/readthedocs.org/blob/0917fbd925998086c958f642d6564de88b142d1b/readthedocs/search/parse_json.py#L62-L62

And we use the sphinx data to provide the path (without extension)

https://github.com/rtfd/readthedocs.org/blob/0917fbd925998086c958f642d6564de88b142d1b/readthedocs/search/parse_json.py#L75

@ericholscher any reason why we can't use the original path and rely on the path from sphinx?

stsewd commented 5 years ago

Also, saving the original path, would help us to reduce the magic that depends on the doctype.

ericholscher commented 5 years ago

I don't fully understand what's going on here, but capturing the real file path is the goal of https://github.com/rtfd/readthedocs-sphinx-ext/pull/62/files#diff-1b00543baae13c24c6592e94f0ea6c1dR210

stsewd commented 5 years ago

I was going to take a look at the extension too, that should fix this problem :)

stsewd commented 5 years ago

Actually if we take the solution from the extension, old projects still will point to the buggy path (we'll need to rebuild all projects to make that work, which is impossible). If we use the path from the process_json function, we'll only need to re-index all projects (thing that we already have done before).

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stsewd commented 5 years ago

hi @tweep, a release was made today, it fixes this issue. We are going to re-index the search index, so it should take time in see the correct results in all projects, but you can re-trigger a reindex for your projects just now triggering a new build for each version.

readthedocs / readthedocs.org