sonatype-nexus-community / nexus-blobstore-google-cloud

Nexus Repository Manager Blobstore backed by Google Cloud Storage
https://help.sonatype.com/en/configuring-blob-stores.html#google-cloud-blob-store
Eclipse Public License 1.0
39 stars 16 forks source link

Datastore cost very expensive #83

Closed jcverazzi closed 3 years ago

jcverazzi commented 3 years ago

Hi, I'm using this plugin, and it works well. Thanks for your work. I installed Nexus on a Google Compute Engine. I found that my cost are very expensive (~200€ / month). There's more money spent on datastore than in compute engine. What could explain that? Can we reduce this?

Thanks for your help.

nblair commented 3 years ago

Hi @jcverazzi - that is surprising to hear. Can you share with me the size of the blobstore and the count of objects (it's listed on the blobstore admin page). What version of NXRM are you using, and which version of the plugin?

Thanks!

jcverazzi commented 3 years ago

Hi @nblair and thanks for your answer.

Here's size and count: image

We're using Nexus OSS 3.28.0-01 and plugin 0.18.0.

My first guess is that the plugin checks my blobstore too often, i've look at tasks such as cleanup policy, but I see nothing "wrong".

nblair commented 3 years ago

Interesting - based on the size I wouldn't expect a lot of activity. If I understand correctly, the plugin uses "Small Operations", which afford 50,000 ops for free per day.

Some more questions:

jcverazzi commented 3 years ago

Here's my billing: image

I only have one cleanup policy: image

Here are my tasks. Only one of them (cleanup) is scheduled, once a day. The other ones were made for migration and are not used anymore: image

We are only 4 developpers, so we're not stressing nexus.

I will try to deactivate my cleanup task and see the result.

nblair commented 3 years ago

Thank you @jcverazzi - this is really helpful, I appreciate your details here. Those Read ops look really spiky, looks like we're on the right path.

I've setup a similar environment in my google cloud account. I ran a stress test to get about 30 GB of content, spanning about 15K blobs. Google Cloud appears to have about a daily cadence on reporting for the Datastore service, so I should know later today. After I ran the stress test, I ran the compact blobstore task - that job removes "soft deleted" blobs from the bucket, and in doing so removes a portion of the data the plugin stores in Datastore (the index of soft-deleted blobs).

All that is left for entities in my datastore for the project is 44 objects (the count and size of blobs in each blobstore volume).

I haven't run the cleanup service task yet - I'll do that as well to see if I can see a corresponding spike in datastore read ops

Could you grab some screens of the https://console.cloud.google.com/datastore/entities and https://console.cloud.google.com/datastore/stats pages? I'm curious to see roughly how many entities, the namespaces/kinds, and the sizes of the entities and indexes.

jcverazzi commented 3 years ago

Seems like my billing is decreasing since I stopped cleanup task. I can see 4$ on yesterday, versus 7$ each day before. Let's see tomorrow how much is it, since cleanup runs on night I think.

Here's my entities: image image

I don't know if it's revelant. Tell me if you need me to run specific queries.

And my stats: image

nblair commented 3 years ago

This is great @jcverazzi - I have a super clear picture about what's going on right now. From your images, it's clear we have a small amount of data being stored - that's good. I would typically expect to see entity counts on the order of 100-1000. The most number of entities for the MetricsShard Kind is bounded to 44 per blobstore. Any other entities would be entries in the DeletedBlobIndex.

The thing to do right away: Seeing 18,925 and 1,176 at the top of the stats I'd recommend setting up a Compact Blob store task for each of your blobstores and scheduling it to run daily. This will trim down the entity counts there and finalize deletion of already soft-deleted (unused) blobs in your storage bucket. This won't trim your bill by that much though unfortunately.

The action item have to take into development: That 550 million read ops is the clear surprise to me. I've been under the impression that the plugin was exclusively using Small Ops, but that is definitely not the case. At the same time, the billing updates for my account finally landed on my side, and I am seeing the same thing - for just a few hours of total NXRM runtime, I've got 290,000 Cloud Firestore Read Ops vs 1,951 Cloud Firestore Small Ops.

I maintain this generally in my available free time, so I can't promise an easy fix in a short duration. I will move this to the top of my priority list and hope to have some more information in a few days.

nblair commented 3 years ago

Hey @jcverazzi - I have a set of changes that will likely cut down the number of document read operations significantly (see #84). Would you be willing to deploy a development build to see if things help? Check out https://github.com/sonatype-nexus-community/nexus-blobstore-google-cloud/tree/reduce-read-ops - I'm going to experiment on my own as well.

nblair commented 3 years ago

0.18.1 of the plugin is now available, please give it a try and let me know how it goes.

jcverazzi commented 3 years ago

Thanks for this work. I installed it this morning and we'll see in next days what happens. I also reactivated my cleanup task (daily task), since it didn't affect my billing.

For the record, here's my billing now (1.18.0): image

PS: as you recommanded, I also created a "Compact Blob store" task that I choose to run weekly for now.

jcverazzi commented 3 years ago

Hi, just for you to know that it's working. Here's my billing for december for firestore (= 0$ since the update): image

Thank you very much for your help.

nblair commented 3 years ago

@jcverazzi excellent - thank you for following up!