quintel / etengine

Calculation engine for the Energy Transition Model
https://energytransitionmodel.com/
MIT License
14 stars 7 forks source link

Only get the latest 100,000 scenarios from db-bot #1359

Open noracato opened 10 months ago

noracato commented 10 months ago

When downloading and importing a copy from the anonomized database dumps from db-bot you can easily spend up to 10 minutes just waiting around. This slow expansion of waiting time happened because over the years the engine's database has grown bigger and bigger. As we only use these database dumps for development and help questions from users, we can opt for dumping not the full database, but only the last hunderd thousand scenarios or so (including the special scenarios like II3050). Then the download (currently 3.3GB) and the import will be much quicker.

Right now the limit is scenarios younger than 3 months and scenarios that are marked with keep_compatible. But that seems like it's not enough.

We can also maintain the current dump and add the light dump as an option to download from db-bot.

What do you think @mabijkerk and @thomas-qah?

mabijkerk commented 10 months ago

Good that you identified this issue @noracato. My first preference would go to the light dump option. This however depends on our use of the db-bot dump. If we do not actually need all those scenarios then it does not make sense to download them. Perhaps there are other filters we can apply that would trim the download?

Since we're having a discussion about keep_compatible next week anyway, let's include this issue in that meeting.

mabijkerk commented 9 months ago

My main question @noracato is what the use is of the dumps from db-bot. Who uses them and why?

thomas-qah commented 9 months ago

Hope you don't mind my answering/chiming in: I think the usecases are diverse and numerous. For example, I used the etengine production dump to do the benchmark testing. It was very useful to have them!

I think that when downloading such a dump from the production db you usually have such a usecase in mind: to inspect how the database is performing in various manners, but also to see how the database is used; how many inputs/sliders do people set on average, how many (custom) curves, etc. As a developer it is much easier to inspect and test such things on a local database. You don't want to do that on a live production database because it puts unnecessary extra load on the server, and even more importantly, it can be dangerous to data integrity and server security.

Personally, I would be in favor of the 'keep current dump and add light dump' option.

mabijkerk commented 9 months ago

Thanks for the explanation @thomas-qah. In that case I think my preference would be to set the default to 1 month old scenarios, but to allow users (meaning ourselves) to specify a different time limit.

Does this seem feasible for you?

noracato commented 9 months ago

Sure. That means we have to save a few different dumps each night (0, 1, 2 and 3) - right @thomas-qah? Which increases the time the server will be very busy and will increase our bill with amazon a bit.

If we would attract more users from different time zones, it could become a problem in the future. As they would be using the model at the times it will be busy with creating the backups. Not sure how much of a problem it actually is, but just putting it down here!

thomas-qah commented 9 months ago

Yes what @noracato writes is correct, but we could of course create a schedule for when each dump gets created. For example:

I think this would decrease the server load significantly, also compared to now :)

github-actions[bot] commented 7 months ago

This issue has had no activity for 60 days and will be closed in 7 days. Removing the "Stale" label or posting a comment will prevent it from being closed automatically. You can also add the "Pinned" label to ensure it isn't marked as stale in the future.