Open tucnak opened 1 year ago
Thank you, Ilya, for the extensive writeup, it's quite an interesting proposition. I completely agree with your point about Postgres being more feature rich compared to MySQL (which we use mostly for the legacy reasons).
However, I must point out, that the engineering effort required for the switch (especially to make it seamless) is non-trivial, both in terms of operations and the codebase change. It's certainly nice to have the features you listed, however, it's not like they can't be implemented with the current data storage.
For example, we already have elasticsearch integration mostly ready, we already have workarounds in place for data aggregation, where it's needed, at our scale there is no problem in using mysql table as a queue or install rabbitmq as a service, and we had examples of bots integration previously and arguably using DB API for this not not the best approach design-wise.
That being said, I appreciate your enthusiasm and help, and I think we should definitely continue this discussion, weighing the options and coming to some mutual agreement for the path forward. I've sent you an invite to our dev discord, hope to see you there!
I have only joined only but recently, however it wasn't long until it had restored in that lepra feeling albeit with a modern twist. This is surprising to me because I assumed, for the longest time, that the lepra phenomenon really can't be replicated. Turns out, given the circumstance, it absolutely can and it is my assessment that it will soon surpass lepra fair and square. That said, there is one aspect to lepra that I think requires quite a bit of forethought, and that is circus— all kinds of fun, exciting things (think elections, funny switches, modes, auto-corrections) that we used to cherish in the lepra of the past. Obviously, the old and tired tricks won't do it; good news is there's time and place for new ones! (That should be developed in secrecy and merged publicly only eventually, to not give away the surprise.)
My line of work has to do with data analysis, forecasting & large language model research.
But first, before we can go into tricks I would like to offer something more substantial from the engineering standpoint. Postgres has a wide array of advantages when it comes to projects with a clear, siloed data flow. First, I will enumerate some of these advantages, and then will formulate a proposition on exactly how these would play out to Orbitar's own advantage in the long run.
json_agg()
in the data layer, however in my experience it's the most clear & performant way to implement. This doesn't have to be done in every query, but sometimes if you can get away with it, it's very rewarding. I could later provide instructive examples of how much more flexibility it has to offer!zdb
Elasticsearch access method bringing the full extent of Elasticsearch capability to Postgres; simply turn yourtext
column into indexedzdb.text
and ZomboDB will replicate its contents to a nearby Elasticsearch instance with quality ranking & query language vastly superior to tf-idf, tsvector, tsquery approaches native to Postgres. This is incredibly useful and would otherwise allow to add state-of-the-art search functionality at a fraction of engineering cost.SELECT pgml.train()
andSELECT pgml.predict()
in it, thereby massively simplifying experimentation, development and deployment of machine learning in the least amount of effort imaginable. Normally, you would have to provision a ML service of some sort, maintain the contract and bear the communication cost, but here it's really as trivial as it seems. First, feature engineer a view, then pass it around as a training rowset, boom, Bob's your uncle. All of your back-end services can now do predictions part of some REST SELECT.CREATE FOREIGN TABLE
to any database in the world, including NoSQL things like MongoDB by specifying loosely-defined schema and the FDW policies. There's a FDW plugin for Telegram bots, can you believe it? So you can SELECT messages from chats, INSERT to reply, et cetera. Food for thought. This means no more ETL, all your data remains in the relational model, no back-end coordination whatsoever. This is not vastly interesting for small projects like this, but otherwise nice to know you have it in case you ever need it. And boy you do.Now, the proposition is as follows; Orbitar is a young, dynamic website that is yet to realise its full potential, and some small tweaks now, when it comes to schema and data integration— would allow to low the bar in complexity, integration, maintenance to the point where complex things would really become trivial. Please don't get me wrong. The value proposition here is not performance, but everything else— ergonomics, extensibility, simplified reasoning, time-to-market. Things like PostgresML would make it trivial to roll out all kinds of models, be it for detecting multiple accounts, pushing GPT-like (there are now plenty of nice Russian and Ukrainian-speaking models of modest size which can be fine-tuned on Orbitar content to provide relevant generations) and other circus tricks that people like so much without ever bothering to set up a machine learning pipeline, sophisticated ranking that would correct for big clusters of bidirectional likes (кармодроч) and so much more.
MySQL holds a dear place in my heart, but these days the benefits of rolling out full house Postges in new projects outweigh all else. Just so you know; I'm very passionate about this project and would love to help out with the data engineering & machine learning work, bring some fun tricks in the process, regardless of what your take on the contents of this issue comes to be. I have a pre-made Docker container with the majority of extensions outlined here, if that's something you should like to try. I also have numerous instructive examples on how to improve things like vote aggregation logic in realtime-correct fashion by an approximate factor of 100x (cache notwithstanding) to the current implementation using continuous aggregates.
This is not a feature request, i.e. I would be happy to consult on & implement anything from above.
Best regards
-Ilya