Closed SuperSandro2000 closed 4 years ago
Whenever you make claims like these, especially
its databases ""always get huge without really storing anything"" more RAM in idle than Postgres under ""full load""
without citing any sources/references (or experiences relating to this project), you are on shaky ground. And having the mindset where you say
a better database like \<insert some SQL database>
without taking into account the project requirements is really a poor mindset. I didn't choose MongoDB "just cuz". The requirements were:
And MongoDB had fit these criteria. As of v6.2.0, concerning memory usage/space, I'll give you the rough numbers on the public hosting of the bot. >3.7 million documents optimized with indexing, ultimately using around 2.5-2.6 GB of memory with the data sized at 1 GB. Performance is admirable, where the only bottleneck is the machine of the bot itself making the feed requests.
For much smaller-scale use cases though? I'd have to argue performance/resource usage is hardly a problem, and is purely preference at that point given that you're managing your resources efficiently in proportion to the load you're giving it.
Postgres support JSON, has a really high throughput and very good performance and Heroku supports it.
around 2.5-2.6 GB of memory
For comparison postgres uses next to no memory at full load. My best personally experience is when indexing musicbrainz entire database dump (28 GB zstandard compressed) which was I/O limited on a good SSD and still did only use 64 MB RAM. Your Mongo stats are comparable to my Elasticsearch ones which is ridiculous for an RSS Bot.
Also there is the cost of maintaining another database for which I would need to create restricted settings, backup scripts, keep up with upstream changes, etc.
Just because Postgres supports JSON, does not mean you should use it as a JSON database. That is completely anti-pattern, and completely disregards the important differences between structured/schema and unstructured/schema-less. Those 3 points I mentioned are just the main ones at the time I decided on a database, but the difference between schema and schema-less are significant when taking into account what articles are being read/stored on every cycle.
When I say sources/references, I don't mean anecdotes, or comparing two completely unrelated projects. I mean legitimate studies and tests done in controlled environments. I can say I used Postgres this one time, and all I did were insert operations while doing 100 reads every 10 minutes whereas, 2 years later while using MongoDB in a completely separate project with I did 5000 reads every 2 minutes and found MongoDB memory usage to be "obscene", and immediately dismiss it as garbage. This has no legitimacy.
You seem to be completely fixated on this one aspect of a database with a vendetta against MongoDB, when in reality there are more important variables to take into account. Playing favoritism on one aspect is, again, a poor mindset.
There are free options for hosting such as MongoDB Atlas which hardly requires any maintenance - as that is the goal of DBaaS. If for whatever reason you're intent on hosting it on your own server, you're not obligated to secure it if this is all your using it for. The bot stores no sensitive information in the DB (unless you count feed URLs), and MongoDB is not exposed by default. For backups, for your convenience, all you have to do is use rss.backup
.
Any further discussion concerning memory usage is irrelevant, as using a SQL database does not align with project requirements at this time (which can change if there is a shift in project requirements). If you really don't want to use MongoDB, databaseless is always an option.
Is your feature request related to a problem? Please describe.
I don't like MongoDB for various reasons. One is that it uses more RAM in idle than Postgres under full load. Another is that its databases always get huge without really storing anything.
Describe the solution you'd like
A better database backend like postgres, mariadb or even sqlite.