paulgoio / searxng

SearXNG image with changed simple theme, settings.yml
https://paulgo.io
GNU Affero General Public License v3.0
100 stars 23 forks source link

Does this instance have a privacy policy? #18

Closed johndoe432 closed 2 years ago

johndoe432 commented 2 years ago

Does it log anything and, if it does, for how long are these logs stored?

mrpaulblack commented 2 years ago

Hi @johndoe432 , Thanks for your questions.

TL;DR Yes I am logging requests.

The long answer: Currently my stack consists of traefik as a reverse proxy and behind it filtron. Behind filtron sits my SearXNG instance. I am not logging anything with filtron nor with SearXNG itself. From traefik on the other hand I am collecting an access log and saving it indefinitly with loki to organize and display this data in grafana dashboards... The entire stack is OSS and I am NOT logging referrer, nor IP address. I also have regex filters in place to remove the search param q= from queries as well as the image_proxy params from those logs. These are the currently used regex filters on my instance:

          - replace:
              expression: '(?:[0-9]{1,3}\.){3}([0-9]{1,3})'
              replace: '***'
          - replace:
              expression: '(/search\?(q=|preferences=).*?\")'
              replace: '/search"'
          - replace:
              expression: '(/autocompleter\?(q=|preferences=).*?\")'
              replace: '/autocompleter"'
          - replace:
              expression: '(/image_proxy\?url=.*?\")'
              replace: '/image_proxy"'
          - replace:
              expression: '(/*\?(q=|preferences=).*?\")'
              replace: '/?q="'

This is a typical log line for making a search with my instance:

{"DownstreamContentSize":9160,"DownstreamStatus":200,"Duration":850427573,"OriginContentSize":9160,"OriginDuration":850159970,"OriginStatus":200,"Overhead":267603,"RequestAddr":"paulgo.io","RequestContentSize":26,"RequestCount":15087,"RequestHost":"paulgo.io","RequestMethod":"POST","RequestPath":"/search","RequestPort":"-","RequestProtocol":"HTTP/2.0","RequestScheme":"https","RetryAttempts":0,"RouterName":"searxng@docker","ServiceAddr":"172.18.0.***:8080","ServiceName":"searxng-searxng@docker","ServiceURL":{"Scheme":"http","Opaque":"","User":null,"Host":"172.18.0.***:8080","Path":"","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"StartLocal":"2022-01-08T20:37:37.633408099Z","StartUTC":"2022-01-08T20:37:37.633408099Z","TLSCipher":"TLS_AES_128_GCM_SHA256","TLSVersion":"1.3","entryPointName":"https","level":"info","msg":"","request_Sec-Fetch-Dest":"document","request_Sec-Fetch-Mode":"navigate","request_Sec-Fetch-Site":"none","request_Sec-Fetch-User":"?1","time":"2022-01-08T20:37:38Z"}

So here is the reasoning for doing this basic logging. I want to know if my site is actually healthy and if it works as expected for the end user. Since there are a lot of changes I want to see if these changes actually correctly on integration.

The metrics I am concerned with the most are:

So the alternative to doing this logging would be to use metrics for example. I have been trying alternative for this issue, I experimented using prometheus metrics. Which would mean: no logging and still data like response time and so...

The problem I am having with this is that these are metrics; They are compared to log inaccurate (This is a problem for 50x logs for example since with metrics errors with the SearXNG can either get overblown or not reported...).

These graph shows the number of requests per 2 minutes for the last 1h with logging: image

This is the same graph with prometheus metrics over the same time period: image

So in the end that means: I can see that a search was done at a specific time; But I cannot see who did it and what that person searched for.

I am open for suggestions to make my current stack better. So please leave a comment if you disagree or have any concern with my setup.

mrpaulblack commented 2 years ago

Ok just to give a quick update I am still going to try out prometheus metrics as a replacement for loki logging and keep this ticket updated with my progress on that...

johndoe432 commented 2 years ago

Thanks for your detailed answer. Just wanted to be sure that there is no personally identifiable information being logged.

I appreciate your work of sharing tools for saving privacy with people!

mrpaulblack commented 2 years ago

Ok just to give an update. I have been working on a prometheus dashboard and got it to a point where I do no longer need the loki dashboard (that uses logging) anymore. I decided to drop the access log from my reverse proxy completely and fully rely on metrics, which are IMO good enough for performance data in production.

Meaning:

Since I am no longer logging anything I am going to close this issue :+1:

MuntashirAkon commented 2 years ago

You should still include a privacy-policy (with a single line in it, for example) or add a simple phrase at the footer (e.g. no logging) that you aren't collecting any PII. (ear in SearXNG has some free space at the top. You can include it there too, but I guess it might be too much.)

And thanks for your efforts. This is the best searX instance to my knowledge right now. (search.disroot.org used to be my daily driver, but sadly, it lost it last year.)

silverwings15 commented 2 years ago

This is the best searX instance to my knowledge right now.

agreed, anon.sx was my go to for a good few months but paulgo edges it out slightly in speed

edit: i also tried out searx.be which was great as well, but decided to settle on paulgo

MuntashirAkon commented 2 years ago

Reliability is a bigger issue I think. I've also tried a few other instances now and then but most of them become slow after a few months (or even days). Paulgo used to be slow last year (when I was trying random instances after disroot's failure in getting any sane results), but it's been much improved. The results are very quick now.

But I believe this is an off-topic discussion. So, I will stop.

mrpaulblack commented 2 years ago

@MuntashirAkon Yeah I think this is a good idea. What do you think of adding like a motd underneath the search input filed on the index page? Something like this maybe? (similar to https://www.qwant.com/) desktop light theme: image

mobile dark theme: image

Otherwise I think a link to a privacy policy in the new about page would be the right step IMO.

MuntashirAkon commented 2 years ago

Yes, this looks good. But I do not think this is enough. For example, I would expect it to say that it logs none of the personally identifiable information such as IP address, User Agent, queries etc. or any tracking cookies, scripts, etc. which you cannot put in one line.

sankhababu commented 1 year ago

Yes, this looks good. But I do not think this is enough. For example, I would expect it to say that it logs none of the personally identifiable information such as IP address, User Agent, queries etc. or any tracking cookies, scripts, etc. which you cannot put in one line.

That will be the right approach IMO.