Closed garrettheel closed 3 years ago
LGTM. @johnthagen please take a look as well.
@johnthagen let me know if you have any questions/concerns! I was looking to submit a few additional config options after this lands too
@garrettheel Sorry for the delay. I haven't had time to look at this yet. Hopefully can this week, otherwise if @manuzhang thinks it's good please don't feel the need to have to wait for me.
@johnthagen this should be ready to go from my end
one thing I noticed was that docs/index.md
duplicates most of README.md
and will become out of date. would it make sense put something more generic in there as a test, rather than the actual docs? or move it to a directory name that indicates that it's used only for testing
@manuzhang set up that originally, but I do agree that it would make sense to have the test's be isolated from the real docs. That would allow the test project to focus specifically on the items we wish to test.
@manuzhang set up that originally, but I do agree that it would make sense to have the test's be isolated from the real docs. That would allow the test project to focus specifically on the items we wish to test.
Cool, I'm happy to change that here if you like. Otherwise we can handle that in a follow-up
@garrettheel Thank you for being willing to help out. Let's make a follow on issue for that so we can review it in pieces.
@johnthagen no worries! I don't have write access so you'll have to merge for me when you're ready :)
@garrettheel I don't have write access either, I just contributed a few PRs. @manuzhang Will need to do the final review/merge.
Whoops, my bad. @manuzhang when you get a chance please take another look :)
@garrettheel @johnthagen thanks guys. Merged.
This introduces a number of performance improvements to help make this plugin feasible for larger documentation sets.
Performance:
lxml
parser for BeautifulSoupSoupStrainer
to only consider elements that should represent linkslru_cache
to a dedicated staticmethod to fetch external URLs. This allows external URLs to be cached between pages. Note that serializing theBeautifulSoup
arg for use with the cache was very expensive before, so this also avoids that.requests.Session
for HTTP requestsHEAD
requests instead ofGET
. Note that some sites may not handle this correctly, but generally this seemed to work fineFeatures:
<a id="blah"></a>
links in markdown pages, where people may want to support custom links in addition to the slugified headings.Other:
raise_error_excludes
does not match. This reduces unnecessary noiseinvalid url - {clean_url} [{url_status}] [{page.file.src_path}]
to provide more information