zackbatist / open-archaeo

A list of open source archaeological software and resources
https://open-archaeo.info
Creative Commons Zero v1.0 Universal
85 stars 16 forks source link

Dead links #14

Closed joeroe closed 3 years ago

joeroe commented 3 years ago

Looking through open-archaeo.csv, I noticed a few projects that seem to have disappeared (or at least are no longer "open"):

Should these be removed?

Also, @steko has moved most of his projects off GitHub so the links need to be updated.

zackbatist commented 3 years ago

Here are the changes I made in light of this issue:

zackbatist commented 3 years ago

I just noticed that the r script does not delete or rename items from the content and docs directories, it just adds new ones. It's not a crucial issue, but something to be aware of when pruning the list in the future.

EDIT: This just created dead links on the main page and tag pages. So I re-added those files to the content and docs directories until a more concrete solution can be devised.

joeroe commented 3 years ago

That's definitely a problem. Probably the R script should delete everything in the content/ directory before it generates new files? That should be safe as long as we're not adding pages manually, just through the CSV.

Dead links on the index pages doesn't sound right, though... hugo should generate those based on what's in (or not in) the content directory. Could it just be a cache issue?

steko commented 3 years ago

Hey, thanks for following my travelling repos :-)

However seeing all the dead links and abandoned repositories I wanted to ask if you have considered the possibility of archiving those (as github mirrors, or at archive.org) for posterity. They may have no practical use but since you're doing the hard work of keeping lists why not take it a step further and preserve those repositories before e.g. launchpad.net shuts down, or the authors delete it?

All the best -- Stefano Costa · https://steko.iosa.it/ orcid.org/0000-0003-1124-3174

Il giorno sab 30 gen 2021 alle 09:45, Joe Roe notifications@github.com ha scritto:

That's definitely a problem. Probably the R script should delete everything in the content/ directory before it generates new files? That should be safe as long as we're not adding pages manually, just through the CSV.

Dead links on the index pages doesn't sound right, though... hugo should generate those based on what's in (or not in) the content directory. Could it just be a cache issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

zackbatist commented 3 years ago

@joeroe I'll try implementing the approach you suggested sometime this week.

@steko Good idea. I'll have to think some more about how we might streamline this (i.e. automatically archive new entries), but I'll try it out this week using this tool for the existing github-hosted git repositories.

joeroe commented 3 years ago

I've realised the dead links are entirely my fault. I hadn't realised that docs/ was used for the live site, so didn't rebuild it in any of my previous PRs. Sorry about that! #15 should fix any remaining issues.

I still think it's a good idea to delete content/ before regenerating it in csv2md.R. But we could also consider:

  1. Switching to a workflow that will build the hugo site on every commit, removing the need to have docs/ in or do any manual rebuilds.
  2. Or, add a final step to csv2md.R that runs hugo to rebuild the docs/ directory.
zackbatist commented 3 years ago

@joeroe I added a bit of code to csv2md.r that rebuilds the docs/ directory. See https://github.com/zackbatist/open-archaeo/commit/8547525f207db2f1ac9d87a75b02c20695a18619. I think that if we were to implement a github actions workflow, it would be to call csv2md.r and run it in its entirety. I'm pretty sure that is the strategy that @MartinHinz implemented here: https://github.com/MartinHinz/ssla_conf_list/actions/runs/531156864/workflow.

@steko I managed to archive all of the items hosted on github, which amounts to around 75% of all items. All the uploads are here: https://archive.org/details/@zackbatist. I may get in touch with the admins to create a collection, which would have a more stable dedicated url. Next up is to get the links added to the csv file. I've got the basics down for this in up2ia.r but for some reason the loop isn't working as needed.

zackbatist commented 3 years ago

Just a brief update on archival work: I uploaded all git repos hosted using GitLab, Codeberg and GitHub Gist. The two items previously hosted on BitBucket have since been moved to Codeberg. Currently 232 / 270 items (~86%) are now uploaded to the internet archive as git bundles. Websites and other media detached from the archived git repo, such as twitter accounts and wordpress blog posts, are not yet archived.

A note about launchpad: the code hosted using Launchpad is tricky to archive because it uses the Bazaar version control system, which is based on python 2.7 and which I know little about. I tried using its own bundling feature and converting them to git, but no luck so far. While this work is at greater risk of being lost due to its current inaccessibility, I currently do not have the ability to do much with it right now. I have cloned (or branched, in bzr terminology) the onto my local machine and will continue to think about how to archive that code, or consult someone with more experience using this tool.

joeroe commented 3 years ago

A few more not-working GitHub links turned up in a recent analysis (some are dead, others just have typos):

zackbatist commented 3 years ago

@joeroe fixed