spaceshelter / orbitar

Experimental collective social/blogging platform with self-regulation.
MIT License
59 stars 23 forks source link

The many benefits of full house Postgres & its extensions #216

Open tucnak opened 1 year ago

tucnak commented 1 year ago

I have only joined only but recently, however it wasn't long until it had restored in that lepra feeling albeit with a modern twist. This is surprising to me because I assumed, for the longest time, that the lepra phenomenon really can't be replicated. Turns out, given the circumstance, it absolutely can and it is my assessment that it will soon surpass lepra fair and square. That said, there is one aspect to lepra that I think requires quite a bit of forethought, and that is circus— all kinds of fun, exciting things (think elections, funny switches, modes, auto-corrections) that we used to cherish in the lepra of the past. Obviously, the old and tired tricks won't do it; good news is there's time and place for new ones! (That should be developed in secrecy and merged publicly only eventually, to not give away the surprise.)

My line of work has to do with data analysis, forecasting & large language model research.

But first, before we can go into tricks I would like to offer something more substantial from the engineering standpoint. Postgres has a wide array of advantages when it comes to projects with a clear, siloed data flow. First, I will enumerate some of these advantages, and then will formulate a proposition on exactly how these would play out to Orbitar's own advantage in the long run.

Now, the proposition is as follows; Orbitar is a young, dynamic website that is yet to realise its full potential, and some small tweaks now, when it comes to schema and data integration— would allow to low the bar in complexity, integration, maintenance to the point where complex things would really become trivial. Please don't get me wrong. The value proposition here is not performance, but everything else— ergonomics, extensibility, simplified reasoning, time-to-market. Things like PostgresML would make it trivial to roll out all kinds of models, be it for detecting multiple accounts, pushing GPT-like (there are now plenty of nice Russian and Ukrainian-speaking models of modest size which can be fine-tuned on Orbitar content to provide relevant generations) and other circus tricks that people like so much without ever bothering to set up a machine learning pipeline, sophisticated ranking that would correct for big clusters of bidirectional likes (кармодроч) and so much more.

MySQL holds a dear place in my heart, but these days the benefits of rolling out full house Postges in new projects outweigh all else. Just so you know; I'm very passionate about this project and would love to help out with the data engineering & machine learning work, bring some fun tricks in the process, regardless of what your take on the contents of this issue comes to be. I have a pre-made Docker container with the majority of extensions outlined here, if that's something you should like to try. I also have numerous instructive examples on how to improve things like vote aggregation logic in realtime-correct fashion by an approximate factor of 100x (cache notwithstanding) to the current implementation using continuous aggregates.

This is not a feature request, i.e. I would be happy to consult on & implement anything from above.

Best regards

-Ilya

Aivean commented 1 year ago

Thank you, Ilya, for the extensive writeup, it's quite an interesting proposition. I completely agree with your point about Postgres being more feature rich compared to MySQL (which we use mostly for the legacy reasons).

However, I must point out, that the engineering effort required for the switch (especially to make it seamless) is non-trivial, both in terms of operations and the codebase change. It's certainly nice to have the features you listed, however, it's not like they can't be implemented with the current data storage.

For example, we already have elasticsearch integration mostly ready, we already have workarounds in place for data aggregation, where it's needed, at our scale there is no problem in using mysql table as a queue or install rabbitmq as a service, and we had examples of bots integration previously and arguably using DB API for this not not the best approach design-wise.

That being said, I appreciate your enthusiasm and help, and I think we should definitely continue this discussion, weighing the options and coming to some mutual agreement for the path forward. I've sent you an invite to our dev discord, hope to see you there!