rimu / no-qanon

A blocklist for QAnon, conspiracy, fake news, nazi websites.
Other
111 stars 8 forks source link

Generation automation #55

Closed NotaInutilis closed 1 year ago

NotaInutilis commented 1 year ago

Automated generation with github workflows! copypasted from https://github.com/quenhus/uBlock-Origin-dev-filter/blob/main/.github/workflows/python-app.yml Now it should generate blocklists every time the domain.txt file on the master branch is updated.

Next, look more into that repo https://github.com/quenhus/uBlock-Origin-dev-filter to see how it automates generation of a lot of different blocklists from different source files. Having different source files would be useful to categorize the block rules.

Also added a couple of far right sites.

rimu commented 1 year ago

This one is going to take a bit of effort from me to understand what is going on. I'll get back to you in a few days.

NotaInutilis commented 1 year ago

No problem. Here's a summary if that helps you: the new file in /github/workflows triggers a github action: every time the domains.txt file gest pushed, it runs the /update script, all the python scripts, commits and push. It can also be ran manually from the action tab of the repo if needed. The commit history is all over the place because well, i was testing it. And it didn't work for a while.

rimu commented 1 year ago

Thanks for that.

The benefit of this is that contributors don't need to have python installed on their computer. eg Windows users. Any other benefits?

One issue I have is that the system is then tightly integrated with github (a proprietary, commercial service) and can never be moved to another platform. Is that right?

NotaInutilis commented 1 year ago

You don't even need to use git per se, or download the source and push it. Contributing can be done on the github website directly, as it allows to edit txt files. It removes a lot of friction for contributers, especially for those not familiar with git, bash, or python. Also, as a windows user myself, the latest python package messes up encoding of txt files. So it's a relief even for me to not have to deal with that on my weird little linux laptop.

I'm not familiar with other platforms such as gitlab, maybe they also have automation (edit: it has. Gitlab's ci/cd works in a very similar way so it's easy to port). But it's not that closely tied to github. The whole workflow system is just a yaml file with a couple bash commands so it can be retreived and put in another script file should you want to move out. I don't know how to code so it's actually super simple: it only automates triggering the original update script, which is still intact in the repo, and pushes to git. This project has a simple codebase, so it's super easy to pack up, go somewhere else, edit the readme and go back to manual generation. For a bigger project yeah, it might be an issue but not for a blocklist. The biggest hurdle would be to set up a proper redirection to the new host for subscribers I guess.

It'd be a shame not to take advantage of that feature, especially for such a project. People who know how fascism works on the internet are usually not able to use git or figure out why python does whatever it does, they're from social sciences or journalism, not computer sciences. If we make it easier for them to contribute, we'd get better data.

rimu commented 1 year ago

Yes, well said. Ok, let's try this.

rimu commented 1 year ago

I've just added a couple of lines to domains.txt and pushed to github. It doesn't look like any other files were automatically updated. What am I doing wrong? Do I still need to run update.sh?

NotaInutilis commented 1 year ago

That's normal! I've changed the automation mechanic. https://github.com/rimu/no-qanon/commit/c385b31e534c814c8897e02fad3335380c8ed5af It now triggers when there's a modification of a txt in the /sources/ folder. It allows categorization and justification of what we're blocking by using folders, txt and md files and comments. I've looked at this pull request https://github.com/rimu/no-qanon/pull/41 and thought about how to do it in the most user-friendly way. It also enables easy customization for users who want to use wikileaks for example (fork, delete a file, auto generate). I've updated the readme to explain it, but I'm not sure what to use to properly communicate these changes since a blocklist does not have a changelog and github repos don't have things like a blog.

NotaInutilis commented 1 year ago

I've put your new entries in the source. The first one saveourstores.nz is here https://github.com/rimu/no-qanon/blob/master/sources/New%20Zealand/Anti%20Smokefree%202025.txt Regarding the second one, webelong.co.nz, at first glance, it does not seem to fall into what we're blocking here, quite the contrary even. Inclusion is quite an anti-fascistic value. Still, I have absolutely no clue about the political context in NZ. What's wrong with it?

rimu commented 1 year ago

Yes, that is an interesting one! It is an astroturfing site run by the extreme right in NZ. Here is some background https://www.rnz.co.nz/news/in-depth/496933/astroturf-accusations-over-we-belong-website-run-by-anti-co-governance-group

NotaInutilis commented 1 year ago

I have one word, and that word is: what. There's so much deception going on it's wild. Anyway, grabbed a couple more urls linked in the article and put them in here https://github.com/rimu/no-qanon/tree/master/sources/New%20Zealand