synzen / MonitoRSS

MonitoRSS RSS bot (formerly known as Discord.RSS) with customizable feeds. https://monitorss.xyz
https://monitorss.xyz
MIT License
1.09k stars 240 forks source link

Support database other than bare file or MongoDB #165

Closed SuperSandro2000 closed 4 years ago

SuperSandro2000 commented 4 years ago

Is your feature request related to a problem? Please describe.

I don't like MongoDB for various reasons. One is that it uses more RAM in idle than Postgres under full load. Another is that its databases always get huge without really storing anything.

Describe the solution you'd like

A better database backend like postgres, mariadb or even sqlite.

synzen commented 4 years ago

Whenever you make claims like these, especially

its databases ""always get huge without really storing anything"" more RAM in idle than Postgres under ""full load""

without citing any sources/references (or experiences relating to this project), you are on shaky ground. And having the mindset where you say

a better database like \<insert some SQL database>

without taking into account the project requirements is really a poor mindset. I didn't choose MongoDB "just cuz". The requirements were:

  1. A document-based schema that coincided well with JSON for databaseless, so there is minimal monkey-patching to make them work together.
  2. A high throughput due to the number of articles its reading and storing on every cycle (especially for tens of thousands of feeds).
  3. Popular support in services that are PaaS (especially those that are free, such as Heroku) for hosters.

And MongoDB had fit these criteria. As of v6.2.0, concerning memory usage/space, I'll give you the rough numbers on the public hosting of the bot. >3.7 million documents optimized with indexing, ultimately using around 2.5-2.6 GB of memory with the data sized at 1 GB. Performance is admirable, where the only bottleneck is the machine of the bot itself making the feed requests.

For much smaller-scale use cases though? I'd have to argue performance/resource usage is hardly a problem, and is purely preference at that point given that you're managing your resources efficiently in proportion to the load you're giving it.

SuperSandro2000 commented 4 years ago

Postgres support JSON, has a really high throughput and very good performance and Heroku supports it.

around 2.5-2.6 GB of memory

For comparison postgres uses next to no memory at full load. My best personally experience is when indexing musicbrainz entire database dump (28 GB zstandard compressed) which was I/O limited on a good SSD and still did only use 64 MB RAM. Your Mongo stats are comparable to my Elasticsearch ones which is ridiculous for an RSS Bot.

Also there is the cost of maintaining another database for which I would need to create restricted settings, backup scripts, keep up with upstream changes, etc.

synzen commented 4 years ago

Just because Postgres supports JSON, does not mean you should use it as a JSON database. That is completely anti-pattern, and completely disregards the important differences between structured/schema and unstructured/schema-less. Those 3 points I mentioned are just the main ones at the time I decided on a database, but the difference between schema and schema-less are significant when taking into account what articles are being read/stored on every cycle.

When I say sources/references, I don't mean anecdotes, or comparing two completely unrelated projects. I mean legitimate studies and tests done in controlled environments. I can say I used Postgres this one time, and all I did were insert operations while doing 100 reads every 10 minutes whereas, 2 years later while using MongoDB in a completely separate project with I did 5000 reads every 2 minutes and found MongoDB memory usage to be "obscene", and immediately dismiss it as garbage. This has no legitimacy.

You seem to be completely fixated on this one aspect of a database with a vendetta against MongoDB, when in reality there are more important variables to take into account. Playing favoritism on one aspect is, again, a poor mindset.

There are free options for hosting such as MongoDB Atlas which hardly requires any maintenance - as that is the goal of DBaaS. If for whatever reason you're intent on hosting it on your own server, you're not obligated to secure it if this is all your using it for. The bot stores no sensitive information in the DB (unless you count feed URLs), and MongoDB is not exposed by default. For backups, for your convenience, all you have to do is use rss.backup.

Any further discussion concerning memory usage is irrelevant, as using a SQL database does not align with project requirements at this time (which can change if there is a shift in project requirements). If you really don't want to use MongoDB, databaseless is always an option.