ooni / probe

OONI Probe network measurement tool for detecting internet censorship
https://ooni.org/install
BSD 3-Clause "New" or "Revised" License
758 stars 142 forks source link

cli: excessive database size because of automatic runs #1950

Open kotenok2000 opened 2 years ago

kotenok2000 commented 2 years ago

Describe the bug

I have noticed that size of ooni probe autorun main.sqlite3 database is 131395584 bytes. After probing it increased in size to 131477504 bytes Are there limits on database size?

To Reproduce

Run ooni probe unattended for several months without turning off system.

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

System information (please complete the following information):

Additional context

Add any other context about the problem here.

kotenok2000 commented 2 years ago

Can you fix this?

hellais commented 2 years ago

Hi, thanks for reporting this issue.

You are right to point out that currently the database growth is unbounded and if you run the probe unattended on a system for a long period of time, it will just keep growing.

As a temporary workaround, what we have been suggesting people that have this sort of use-case is to periodically run ooniprobe reset --force followed by ooniprobe onboard --yes to bypass the informed consent (see for example the murakami integration of OONI Probe: https://github.com/m-lab/murakami/pull/103/files).

I agree this is not ideal and we ought to rather have some setting which allows users to only keep measurements in the DB that are newer than some date.

Do you think having such an option would be a good solution to your use-case?

A similar issue has already been reported here: https://github.com/ooni/probe/issues/1927

kotenok2000 commented 2 years ago

Why do we even need database for ooniprobe-unattended?

kotenok2000 commented 2 years ago

I wonder if this happens on android devices too?

bassosimone commented 2 years ago

@kotenok2000 for Android, we're not tracking background runs using the database, which leads to an orthogonal set of issues. That is, that we have no confidence that such background runs has ever happened. We discussed this issue recently with @aanorbel, where we were indeed noting that it's bad not to have any kind of feedback from automated runs. I think we should adopt a solution where for automated runs we prune the database more aggressively but we still have a way to know that recently some automated runs occurred.