Closed Totktonada closed 2 years ago
It's already done in the weekly report. Check out the last one and scroll to the bottom of the letter.
You will see this:
Just checked and reopened three of seven issues linked above. Sure, it is good to have information about 404 errors, but it does not guarantee that we'll not broke old URLs. This issue is about a tool that compares a set of available URLs before and after a change. If the latter set does not contain some items from the former, we're in a trouble.
If the website team does not bother about the website reliability, let's close the issue as won't fix rather than fixed.
Raw idea: we can start from a constant list of URLs and add new ones when the sitemap is changed.
Why that issue was closed? The referenced PR is doesn't solve the problem.
for now I will get letters about top-404 and make redirects. Or is it better to left this issue open?
This issue is not resolved. I already wrote why, see above. It seems I even convinced Artur, wow.
I still find broken URLs in projects' README files and docs from time to time. I still think that our users should not see 404 errors while going across links in our projects, on StackOverflow or in publications. Each of those URLs may be used rarely, but something is almost always going to broke.
If we'll not start to spot such problems before delivery, they will never gone. If so, the website will not be usable to give a link to the documentation. We should wrap all links in any docs or articles using achive.org or something like this.
It seems, I need to prove that we unable to use the always-broken website even within our organization. Okay.
I have a dump of all public repositories within the organization from 2020-08-18. Let's show hits and misses:
$ echo -e 'URL | Result |\n| --- | --- |'; grep -REho 'https?://(www\.)?tarantool.(org|io)/[^] <>"'"'"')]+' 2>/dev/null | grep -v '{{' | while read url; do url="${url%%.}"; url="${url%%,}"; result="$(curl -SsfL "${url}" 2>&1 >/dev/null && echo OK)"; echo "| ${url} | ${result} |"; done
Be fair and close as won't fix if you're not going to fix the real issue.
We might want to setup some kind of regular check for such misses.
Just brush Sasha's script a bit (apply unique
at least) and put it into nightly testing on GH.
However I have no idea in which repo to put the job.
README.md
# Clone all Tarantool public repositories
## How to run
```shell
./collect.sh # obtain list of repositories in repos-*.json
./list.sh # obtain common list of repositories in list.txt
./clone.sh # clone all repositories to repos/
collect.sh
```shell
#!/bin/sh
set -eux
repos_url='https://api.github.com/orgs/tarantool/repos'
page_count=$(curl -fsSI "${repos_url}" | grep '^link:' | sed -e 's/^.*repos?page=\([0-9]\+\)>; rel="last".*$/\1/')
for i in $(seq 1 ${page_count}); do
curl -fsS "${repos_url}?page=${i}" > repos-${i}.json
done
list.sh
#!/bin/sh
set -eux
for f in repos-*.json; do
jq --raw-output '.[] | .name' < "${f}"
done > list.txt
clone.sh
#!/bin/sh
set -eux
mkdir repos && cd repos
while read repo; do
clone_url="https://github.com/tarantool/${repo}.git"
git clone --depth 1 "${clone_url}"
done < ../list.txt
For collecting of all repositories (including private ones) use collect-auth.sh
instead of collect.sh
and adopt clone.sh
(use ssh instead of https).
collect-auth.sh
#!/bin/sh
set -eux
repos_url='https://api.github.com/orgs/tarantool/repos'
page_count=$(curl -H "Authorization: token ${GITHUB_TOKEN}" -fsSI "${repos_url}" | grep '^link:' | sed -e 's/^.*repos?page=\([0-9]\+\)>; rel="last".*$/\1/')
for i in $(seq 1 ${page_count}); do
curl -H "Authorization: token ${GITHUB_TOKEN}" -fsS "${repos_url}?page=${i}" > repos-${i}.json
done
Those scripts do not support incremental data update (neither for the list of repositories, nor for updating existing clones).
Will be implemented in tarantool/website-links#4
See, a lot of times we broke the old URLs:
https://github.com/tarantool/doc/issues/220 https://github.com/tarantool/doc/issues/336 https://github.com/tarantool/doc/issues/710 https://github.com/tarantool/doc/issues/1491 https://github.com/tarantool/doc/issues/1653 https://github.com/tarantool/doc/issues/1677
So we unable to lean on the current URLs when writting an article or a documentation of a module. After some time, they will be broken.
I hope that is a tool, which may check the website for problems of this kind.