Closed ThomasWaldmann closed 1 year ago
I've added a spider to check the src, href, data-href, and data attributes on the site and pointed it at my server which is running moin behind apache with wiki_root = devwiki
three issues from the spider:
one simplification
samples:
pull request welcome.
is issue #1259, fixed markdown, currently working on rest. I thought docbook was OK, will check again.
was easiest way to get html-dump working. Better ideas welcome.
pull request welcome.
Don't be shy, new issues and pull requests are welcome. Thanks for your work todate.
Reopening because scrapy issue is still active.
Thie work to date using Scrapy is difficult to review because of the number of commits and not related to the original issue of "167 Should wiki-root be included in href". It would have been better to open a new issue for the addition of scrapy.
It is clever to run scrapy as a part of the pytest procedure, but I wonder if that is the best place. Having to start a server running at 9080 before running tests seems an unusual requirement. Wiki Admins that install a future version of moin from pypi are unlikely to run tests frequently/ever but may want a means of checking for broken links.
It could be added as a sibling to /moin/contrib/loadtesting, but then it would not be available to future wiki admins that install moin from pypi.
Another alternative would be to add it under src/moin/scripts/sitetesting and add a new command moin find-broken-links
. (after cloning your repo, the entire src/moin/scripts directory is missing?)
The crawl.csv and crawl.log output seems hidden under /src/moin/_tests/sitetesting/scrapy among source code. The server.log is created in /src/moin/_tests/sitetesting/server.log among source code. All of these should go in the instance root as a sibling to wikiconfig.py.
My server/log file has several SyntaxErrors:
--snip--
File "C:\git-bylsmad\moin\.tox\py310\lib\site-packages\flask\cli.py", line 123, in call_factory
return app_factory(*args, **kwargs)
File "C:\git-bylsmad\moin\.tox\py310\lib\site-packages\moin\app.py", line 50, in create_app
return create_app_ext(flask_config_file=config,
File "C:\git-bylsmad\moin\.tox\py310\lib\site-packages\moin\app.py", line 99, in create_app_ext
app.config.from_pyfile(path.abspath(flask_config_file))
File "C:\git-bylsmad\moin\.tox\py310\lib\site-packages\flask\config.py", line 120, in from_pyfile
exec(compile(config_file.read(), filename, "exec"), d.__dict__)
File "C:\git-bylsmad\moin\src\moin\_tests\sitetesting\wikiconfig.py", line 1
../../config/wikiconfig.py
^
SyntaxError: invalid syntax
The missing links ending in Discussion
should be ignored/accepted as these are links to create a Discussion subpage should none exist. The 'Discussion' variable needs to be retrieved from wikiconfig, see /src/moin/config/default.py supplementation_item_names
in case some wiki admin changes it to something other than English.
If you are running under the practice that Git commits are cheap and you should commit whenever you have a bit of code you like, then you should use git rebase to squash the commits and cleanup the commit message. This would make it easier to review and easier to maintain. It would be nice to have one commit resolve one issue, but this is frequently not achievable.
Creating a new issue #1375, add future scrapy activity there. Closing this issue as complete.
Original report by RogerHaase (Bitbucket: RogerHaase, GitHub: RogerHaase).
See the mark_item_as_transclusion method within converter/html_out.py for an example of the problem and a workaround.
This problem arises only when the wiki is not run at server root.
Should href's within the emeraldtree DOM be of the form /mywiki/myitem or just /myitem? Currently href's for pages take the latter form and href's for objects take the former form.
1a. If the preferred form does not include the wiki-root, define a method for obtaining the wiki-root.