Closed agrabeli closed 1 year ago
I am going to close this issue as a duplicate because:
the part related to semi-automatically remove domains is covered by https://github.com/ooni/probe/issues/1747 and has been implemented by https://github.com/ooni/probe-cli/pull/1114
the part related to semi-automatically removing parked domains is covered by https://github.com/ooni/probe/issues/1826
Because this issue covers both cases, it is a full duplicate of those two issues.
Given that the Citizen Lab test lists (https://github.com/citizenlab/test-lists/tree/master/lists) were originally created by Open Net Initiative researchers between 2008-2012, they include many URLs with expired and parked domains.
It would therefore be great if we could create a script that automatically detects and deletes URLs with expired and parked domains.
This would significantly simplify the test list review process of researchers, and it would also improve OONI measurement quality.
This activity has been included as an OONI challenge in Roskomsvoboda's DEMHACK hackathon (September 2022): https://demhack.ru/
If this activity is not implemented as part of the hackathon, the OONI team should pick it up.
[Update: 2023-03-15 - we did half of the work; please, see https://github.com/ooni/probe/issues/1826, which covers the remaining part of the work originally covered by this issue.]