milesmcc / shynet

Modern, privacy-friendly, and detailed web analytics that works without cookies or JS.
Apache License 2.0
2.89k stars 183 forks source link

A vision for the next generation of Shynet #258

Open milesmcc opened 1 year ago

milesmcc commented 1 year ago

Hey everyone—

The past 3+ years of Shynet have been inspiring. Together, we've built a really great product — and we have some great adoption to show for it. Lots of people and organizations use Shynet, from privacy-minded individuals to some of the largest companies on Earth. (I wish I could give you exact figures on Shynet's adoption, but alas, we do not currently collect any metrics. 🙂)

Today, Shynet is an extremely simple Django app. Hits and sessions are stored in a Postgres database. There's no support for custom events, and our pruning/rollup strategy for old sessions... does not exist. The dashboard slows down dramatically for high-traffic sites. And while Shynet is all-things-considered very privacy friendly, there are additional steps we could take to dramatically improve the privacy assurances we can make (e.g., by using differential privacy, on-device aggregation, and so on).

So with all this in mind, I want to probe the community on what you'd like to see from Shynet moving forward. My engineering capacity is pretty maxed-out right now, so no major changes are imminent. But I think we have an opportunity to leapfrog some of the other analytics tools on the market if we're willing to make some major changes.

Here's one path we could take:

Shynet technically hasn't hit 1.0, so nothing is really out of the question in terms of how we achieve these ends. For example, while of course I'd like to maintain perfect backwards compatibility, I think we should also consider making breaking changes and simply providing folks a migration tool if they'd like to use "Shynet v2".

P.S.: I have been in love with Elixir + Phoenix recently (used it to build https://atlos.org). That might be a more appropriate tech stack for a tool like Shynet (Plausible, for example, uses Elixir + Phoenix), but I also recognize that some in the community might appreciate the familiarity of Python.

haaavk commented 1 year ago

First things first. Thanks a lot for building Shynet. It helped me escape from terrible Google Analytics. Some thoughts about Shynet in not particular order:

c4lliope commented 1 year ago

Hello, I've been running Shynet for 2! days now and I'm really happy at how simple it's been to deploy using docker-compose.

So long as Clickhouse has a docker image which can be packed easily into a docker-compose.yml file, I see no reason to hold back from adoption. https://hub.docker.com/r/clickhouse/clickhouse-server/#

In my highly-localized application, I rely on IPs to see which states people are logging in from. As a mainly-USA application, I care less than many people do about GDPR, and so I'd make a proposal here: if you could make a small engine inside the application for plugins or bespoke code, then end-users could make up the logic on a per-application basis. In my case, this could be:

I like and encourage your decision on Elixir and Phoenix, this seems like a prime use case for both.

sergioisidoro commented 1 year ago

My 2 cents on ClickHouse:

I've used Plausible, which runs with ClickHouse, and for small projects it starts using unreasonable amounts of space. For a few hundred events per day it starts to hog unreasonable disk (maybe I'm doing something wrong). Also I'm so much more familiar with Postgres backup and restore procedures, that it was a bit of a pain to learn and setup them for Clickhouse.

So I keep coming back to Shynet as the alternative for small projects. What if this is Shynet's niche?

I do miss custom events tho... I could give it another shot at https://github.com/milesmcc/shynet/pull/168 if you want.

c4lliope commented 1 year ago
  Sérgio, 

  Your remarks on Clickhouse are making me curious to benchmark my https://session.place deploy. I measure around 6 domains. 

  On https://assemble.press you can see a common use case: embed dashboards, including chosen panels or graphics. One embed link per graphic seems ideal. 

  Ends up being a primary reason behind our choice on Plausible. 

   On Monday, Jul 17, 2023 at 8:37 AM, Sérgio ***@***.***> wrote: 

  My 2 cents on ClickHouse: 
  I've used Plausible, which runs with ClickHouse, and for small projects it starts using unreasonable amounts of space. For a few hundred events per day it starts to hog unreasonable disk (maybe I'm doing something wrong). Also I'm so much more familiar with Postgres backup and restore procedures, that it was a bit of a pain to learn and setup them for Clickhouse. 
  So I keep coming back to Shynet for the alternative for small projects. What if this is Shynet's niche? 
  I do miss custom events tho... I could give it another shot at #168 if you want. 
  —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> 
  ***@***.***": ***@***.***": "EmailMessage","potentialAction": ***@***.***": "ViewAction","target": "https://github.com/milesmcc/shynet/issues/258#issuecomment-1638048736","url": "https://github.com/milesmcc/shynet/issues/258#issuecomment-1638048736","name": "View Issue"},"description": "View this Issue on GitHub","publisher": ***@***.***": "Organization","name": "GitHub","url": "https://github.com"}}]
sergioisidoro commented 1 year ago

Ok, maybe I need to contextualise "unreasonable" because it greatly depends on the use case.

I had a small project running a very simple VM on Digital Ocean. We had <100 visitors per day, and very little events. I deployed plausible for that project on a docker swarm. In such a small project (Postgres, Django, Redis, worker, ghost and maria for a blog, + plausible), clickhouse hogged out the disk space although there was not that many events (~50Gb if I recall).

Self hosting sometimes is small, and having small footprint projects (in memory and disk) such as Shynet is super nice for that use case. If all projects start looking for scale, and adopting dependencies with larger starting footprint (Clickhouse, Elastic, etc), the requirements for self hosting an entire small stack (eg, Service + Blog + Analytics) start to go up.

There is nothing wrong in having a large footprint when there is scale - All I'm arguing here is that if there is no scale, the footprint to self host should be minimal :)

Caveat: bear in mind that I might have done something wrong in deploying Clickhouse, since I was using mostly the defaults from the official image.

rallisf1 commented 11 months ago

I know I'm late to the party, I've barely used shynet but I'd like to share my 2 cents: