iop-alliance / okh-search

A self-hostable, federated search for open source hardware
https://search.openknowhow.org
MIT License
10 stars 9 forks source link

Outdated appropedia & field-ready manifests #55

Open hoijui opened 2 years ago

hoijui commented 2 years ago

Both of those are hosted on appropedia, and together they make up the big bulk of projects on this list. Appropedia now has a way to export (generate) an OKH manifest file from the projects wiki site. This leads to newer files then the ones linked to on this list.

Example project and link that generates/downloads its okh.yml file:

So it is really just a fixed URL with the project title as a parameter.

PS: I found out about this, because I saw that all the Field Ready project manifest files in this list, use the licensor: field incorrectly; it is correct though, when using the generated file from the URL above (https://www.appropedia.org/okh.php?title=TITLE).

In my opinion, it would make more sense to have a script that generates a list of manifest file URLs (including maybe also fetching a static list), and maybe also optionally downloads all of them, instead of having a static list like this one here, because... see the issue above. optionally, that list could be generated periodically (once a week?) in a CI job, and then hosted on the projects pages, and linked to in the README. We could then also run a check on all the URLs, to see if they are available, and if we have such a tool, check if they are valid.

I could maybe help if such things were desired.

hoijui commented 2 years ago

To scrape appropedia projects is easy (because of that nice generation URL thing), once we have the list of all project names. To get that, my best bet so far is using https://www.appropedia.org/Category:Projects, but that is quite ugly, because it uses a not very nicely scriptable paging system (shows only 200 projects at a time, of the 17xx total).

Coolest thing of course, would be for appropedia to supply a URL to download the name of all projects, or even better, to generate a zip file containing all the latest generated OKH files. I noticed, that appropedia.org is quite slow, so I imagine they already have problems with the load, so we should be careful with scraping.

hoijui commented 2 years ago

I did it (with the help of Emilio Velis from appropedia.org): https://github.com/OPEN-NEXT/LOSH-Appropedia-Scraper See the second link in the README (that file uses the same format like the CSV file in this repo).

It is a generated list, which is re-generated once a week. I think it makes more sense to keep it separate from the one in this repo, as it is dynamic, but you have to decide how to handle it. You are also welcome to fork my repo and host it under this organization instead.