protontypes / open-sustainable-technology

A directory and analysis of the open source ecosystem in the areas of climate change, sustainable energy, biodiversity and natural resources. https://docs.getgrist.com/gSscJkc5Rb1R/OpenSustaintech
https://opensustain.tech
Creative Commons Attribution 4.0 International
2.02k stars 236 forks source link

Fixing redirecting links #76

Closed nmstreethran closed 3 years ago

nmstreethran commented 3 years ago

Hi,

I think it would be a good idea to periodically check all the links in this repository and fix the ones that are redirecting.

For example, I noticed that the link to PowerGenome (https://github.com/gschivley/PowerGenome) redirects to https://github.com/PowerGenome/PowerGenome. If, in the future, a new repository or fork is created that points to the old URL, the link will no longer be correct.

To prevent this from happening, you could set up a scheduled GitHub Action with a link validator (such as awesome_bot) to check links every month or so. In addition to redirecting links, this will allow you to identify links that no longer exist, or name changes to projects.

Thanks and I apologise if you have already discussed this. I didn't find anything related in the issues so I thought I'll post my ideas here!

Ly0n commented 3 years ago

Hey @nmstreethran,

that's a good suggestion. It is easily implemented and should help to remove dead projects or redirected URLs. We should definitely test this. It would also help to read the list in a scripted way like we are planning to do: https://github.com/protontypes/open-sustainable-technology/issues/70

Things that could be relevant:

  1. The Github.com server could start blocking the many requests (>1000) that we are doing.
  2. In principle, you need a log history to check which URL was not accessible over a longer period of time. Maybe some projects a just in some maintenance at the moment. However, I do not consider this to be problematic. We can also check the action logs regularly in a manual way to see if the same project is always unavailable. Nevertheless, the action could fail but still, everything is fine.
  3. We also have to consider how to deal with other code errors 403 and 429.

If you like to you are very welcome to create a pull request ( and test also our new Continuous Reforestation implementation :deciduous_tree:. I can also do it so that we have an implementation to discuss.

nmstreethran commented 3 years ago

Thanks @Ly0n.

Regarding the GitHub server blocking the large number of requests, and error code 429 (Too Many Requests), awesome_bot has a --request-delay option to delay each request. Setting it to a reasonable value (maybe 0.5 seconds?) will probably fix this, but it has to be tested. The action will take longer to complete, though.

I'll think about point 2 and other error codes and let you know if I come up with something. I'll check out the issue you referenced and the Continuous Reforestation repository as well in the coming week.

nmstreethran commented 3 years ago

Sorry for the delay! I tried testing the links in README.md last night using awesome_bot locally. I found a few redirecting links, which you can see below. I've included the full log as an attachment.

List of redirecting links ```txt 01. [L0074] 301 https://github.com/sibyjackgrove/SolarPV-DER-simulation-utility → https://github.com/tdcosim/SolarPV-DER-simulation-utility 02. [L0150] 301 https://github.com/izabala123/BEMRosetta → https://github.com/BEMRosetta/BEMRosetta 03. [L0157] 301 https://github.com/charxie/energy2d → https://github.com/charxie/multiphysics 04. [L0399] 301 https://github.com/gschivley/PowerGenome → https://github.com/PowerGenome/PowerGenome 05. [L0436] 301 https://gitlab.com/diw-evu/dieter_public/dieter_py → https://gitlab.com/diw-evu/dieter_public/dieterpy 06. [L0459] 301 https://github.com/rl-institut/mvs_eland → https://github.com/rl-institut/multi-vector-simulator 07. [L0464] 302 https://bitbucket.org/harald_g_svendsen/powergama/ → https://bitbucket.org/harald_g_svendsen/powergama/wiki/Home 08. [L0519] 301 https://openei.org → https://openei.org/wiki/Main_Page 09. [L0533] 301 https://github.com/tmrowco/northapp-contrib → https://github.com/tmrowco/bloom-contrib 10. [L0540] 301 https://github.com/mlco2/code-carbon → https://github.com/mlco2/codecarbon 11. [L0603] 301 https://www.appropedia.org/ → https://www.appropedia.org/Welcome_to_Appropedia 12. [L0609] https://ecostress.jpl.nasa.gov/ SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate) 13. [L0633] 301 https://github.com/atreyasha/vegMonitor → https://github.com/atreyasha/vegetation-monitoring 14. [L0654] 301 https://github.com/pyronear/PyroNear → https://github.com/pyronear/pyro-vision 15. [L0704] 301 https://github.com/mankoff/freshwater → https://github.com/GEUS-PROMICE/freshwater 16. [L0713] 404 https://forge.ipsl.jussieu.fr/nemo/chrome/site/doc/NEMO/guide/html/NEMO_guide.html 17. [L0809] 301 https://github.com/apache/climate → https://github.com/apache/attic-climate 18. [L0848] 301 https://github.com/Vizzuality/climate-watch → https://github.com/ClimateWatch-Vizzuality/climate-watch 19. [L0863] 301 https://github.com/adventuroussrv/Climate-Change-Datasets → https://github.com/OpenFloodAI/Climate-Change-Datasets 20. [L0915] 301 https://gitlab.version.fz-juelich.de/toar/mlair → https://gitlab.version.fz-juelich.de/esde/machine-learning/mlair 21. [L0916] 301 https://github.com/amaurymartiny/shoot-i-smoke → https://github.com/shootismoke/mobile-app 22. [L0929] 301 https://github.com/williamorim/Rpollution → https://github.com/openvironment/Rpollution ```

Notes and observations:

  1. I added all non-project links to the whitelist, such as the links used by the badges. I did notice that the An Animated Map of the Earth's author's Twitter account no longer exists. This could be temporary, but you could consider linking their website instead of Twitter.
  2. I set a request delay of 1 second to prevent failures due to too many requests. The downside is that the test takes about 16 minutes to complete (as there were 931 links). This could be an issue if you have a limited amount of CI minutes. I tried setting a shorter delay and it didn't seem to have a noticeable difference to me, but I could be wrong.
  3. ECOSTRESS (no. 12) could be added to the whitelist to prevent the test from failing due to unverifiable certificates.
  4. NEMO (no. 16) gives a 404 but the project still exists; the link should be updated to https://forge.ipsl.jussieu.fr/nemo/wiki/Users.
  5. Overall, there were only 22 redirecting links, so it should be easy enough to find and replace manually. However, I noticed that some projects have a new name, e.g. energy2d (no. 03) is now multiphysics. So, the description of these projects may also be out of date.

Here's an example GitHub Action file which uses awesome_bot. I've set a monthly schedule and am using the Ruby gem method. Let me know what you think and if you would prefer using a different implementation.

GitHub Action ```yml name: linkcheck on: schedule: - cron: "0 3 20 * *" jobs: test: runs-on: ubuntu-latest # container: dkhamsing/awesome_bot # Docker method steps: - name: Check out Git repository uses: actions/checkout@v2 # begin Ruby gem method - name: Set up Ruby uses: ruby/setup-ruby@v1 with: ruby-version: 2.7 - name: Install awesome_bot and dependencies run: | gem install awesome_bot # end Ruby gem method - name: Check links using awesome_bot run: | awesome_bot --allow-dupe --skip-save-results --request-delay 1 \ --white-list \ tabletopwhale.com,protontypes.eu,opensustain.tech,gitter.im,\ badgen.net,github.com/protontypes/open-sustainable-technology,\ contrib.rocks,github.com/eleanorlutz/earth_atlas_of_space,\ twitter.com/eleanor_lutz \ README.md ```
tjarkdoering commented 3 years ago

Hi @nmstreethran , Thanks for your work! This is really good feedback. I will look into this and probably implement your suggested workflow soon. If you are interested in doing more with us, feel free to join any of our online meetings.

nmstreethran commented 3 years ago

Thank you for the feedback, @tjarkdoering! I'm happy to contribute further and will join the meetings when possible.

Ly0n commented 3 years ago

That's really amazing and very important for our future work since we are planning to read metadata via the GitHub API to create a database out of it. That's why is is very important that the list always keep clean and readable in a machineable way. @nmstreethran You are very welcome to join our next session. Check out the slides the from the LF Energy conference yesterday to get some idea what we are gone do with the list in the future (slide 10): https://github.com/protontypes/organization-documents/blob/master/slides/protontypes_measuring_the_open_and_sustainable_technology_world.pdf

Ly0n commented 3 years ago

@nmstreethran. I checked your GitHub Action script and the URL issues you found. Again, great work. I would like to implement it today but do not want to steal your PR. For me, it is no problem to implement it but it is at the end your performance. What are your thoughts on it?

nmstreethran commented 3 years ago

Hi @Ly0n, I do not mind either way. I can create a PR tomorrow, but if you wish to implement it today itself, then please go ahead.

By the way, are your meetings every Thursday at 18:30 CET? Just a heads up, the next meeting's date is incorrect in the organization-documents repository.

Ly0n commented 3 years ago

@nmstreethran . We use the organization-documents README just for logging. Normally we use we Gitter chat to announce the next meeting: https://gitter.im/protontypes/community

Most of the time we meet at least once per week at Thursday 18:30 CET. If you would like to join and this time is bad for your we could also switch it.

nmstreethran commented 3 years ago

Sorry @tjarkdoering, I just noticed that you have made a commit regarding this issue!

tjarkdoering commented 3 years ago

Just one minute ago :smile: But it was only for the redirect links.

nmstreethran commented 3 years ago

I will try to fix the conflicts :)

tjarkdoering commented 3 years ago

Thank you!

nmstreethran commented 3 years ago

I think it has been fixed now. Let me know if there's anything else I can do!