opensourceBIM / BIMserver

The open source BIMserver platform
GNU Affero General Public License v3.0
1.55k stars 609 forks source link

Use external NoSQL database instead of Java #422

Closed stestaub closed 7 years ago

stestaub commented 7 years ago

In order to scale the application better, we would like to use an external database like DocumentDB or MongoDB instead of local BerkleyDB. Would that basically be possible and what effort would that be? Any help would be Appreciated.

rubendel commented 7 years ago

That should definitely be possible. In theory your implementation should only have to implement the KeyValueStore interface.

Performance Now performance wise I expect it to be quite a bit slower. Certain parts of BIMserver do a lot of random reads/writes.

For example from the DocumentDB website:

DocumentDB guarantees less than 10 ms latencies on reads and less than 15 ms latencies on writes for at least 99% of requests.

A lot of the positive performance characteristics of BIMserver at the moment can be attributed to the fact that the database is running on the same machine, in the same process/jvm. This is especially good for low latency.

Transactions MongoDB seems to support some sort transactions, but not when sharding is enabled (https://docs.mongodb.com/v3.2/core/write-operations-atomicity/). You might come pretty far with some (external) locking mechanism.

Identify the real bottleneck Depending on your use case, you might want to make sure that the database is actually the bottleneck. In most cases I have seen it seems to be the process where the geometry is generated.

We are running BIMserver on machines with 256GB of ram and have not seen a model that could not be uploaded. Concurrent access is also pretty good I think.

I am interested in what way you want BIMserver to be more scalable.

stestaub commented 7 years ago

Thank you for this explanation. We would like to have the database externally to be able to allow multiple instances of BiMServer to access the same database, in order to scale horizontally. Beside the possibility of autoscaling when having more cucurrent requests, its also more failsafe as we can have redundant BiMServers.

Would it be possible to contact you on a more private channel in order to discuss this more deeply? We would very appreciate your help. I could then give you also more insights into our Project.

rubendel commented 7 years ago

Removed link

stestaub commented 7 years ago

It seem to require an @logic-labs email address in order to sign up on that slack account. May I reach you via the logic-labs email address provided on the homepage?

rubendel commented 7 years ago

Sure, no problem

leohsu91 commented 7 years ago

We have also encountered the same problem. In our business system, we need bimserver complete ifc file 3D display. and we deploy it in docker environment. We want to make bimserver stateless,so that we can dynamically adjust the server configuration(cpus, rams..), reboot the docker container at any time, increase the number of containers. Most importantly, we want to persist the bimserver database. Based on the above reasons, we want to change bimserver key/value storage to nosql database(mogodb) or rdbms. So do you consider that release an official version to support nosql database?

hujb2000 commented 7 years ago

Yes, I want also to change into an external storage support distribuated database instead of berkeley db, Thus, so the application will be statusless and can been scaled horizontally.

klacol commented 7 years ago

Yes, I like this too. I would like to work with tools for backup, restore and data analytics on the DB. I would like MongoDB.

rubendel commented 7 years ago

stestaub and I have been in contact, I'll post part of the discussion here

We have thought about making BIMserver scalable before, but at the moment my opinion is that features/stability are more important than a (linearly) scalable solution. There is no point in having a scalable solution that has not gained widespread adoption (in our view). I understand that you have already met the limits, so I would be happy to think along.

BerkeleyDB The last version has stable support for at least a master/slave setup, even multi-slave I think. If your workload is read-heavy this could maybe work? I think even failover could maybe be implemented with this.

Other database Trying to implement on another database (like MongoDB) would be a very interesting project (technically :)), but real hard to estimate in time. Also it will not be possible to give any guarantees about performance. I would be very interested in building a prototype.

TL;DR At the moment other projects have higher priority than this, but if there is anyone else who really wants to dive deep into this, I'd be glad to help.