praeclarum / FuGetGallery

An alternative web UI for browsing nuget packages
https://www.fuget.org
MIT License
683 stars 121 forks source link

fuget.org OutOfMemoryException #108

Closed jzabroski closed 3 years ago

jzabroski commented 4 years ago
  1. Go to https://www.fuget.org/packages/fluentmigrator.console
  2. See exception stack trace:
Error reading package
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.IO.MemoryStream..ctor(Int32 capacity)
   at System.Net.Http.HttpContent.CreateMemoryStream(Int64 maxBufferSize, Exception& error)
   at System.Net.Http.HttpContent.LoadIntoBufferAsync(Int64 maxBufferSize, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at FuGetGallery.PackageData.PackageDataCache.ReadPackageFromUrl(PackageData package, HttpClient httpClient, CancellationToken token) in D:\home\site\repository\Data\PackageData.cs:line 402
   at FuGetGallery.PackageData.PackageDataCache.GetValueAsync(String arg0, PackageVersion arg1, HttpClient httpClient, CancellationToken token) in D:\home\site\repository\Data\PackageData.cs:line 381
jzabroski commented 4 years ago

I believe the issue is due to capitalization in the package name, but can't fathom why capitalization would trigger an OutOfMemoryException.

Compare the following two links:

  1. https://www.fuget.org/packages/fluentmigrator.console
  2. https://www.fuget.org/packages/FluentMigrator.Console

In addition, it seems fuget.org landing page caches all urls I hit, regardless if they're valid or not, which causes the following landing page screenshot:

image

jzabroski commented 4 years ago

@praeclarum If you want to show me how to debug this, I'd be interested in helping fix it and contributing to this awesome project. However, the stack trace I posted isn't enough to help me figure out what URL is actually being used so I'm not sure how to repro.

jzabroski commented 4 years ago

@praeclarum I think this issue could be due to your CloudFlare configuration. Just a guess.

praeclarum commented 4 years ago

Thanks for the info!

Yeah the OOM is an old bug I haven't been able to squash.

But this casing problem is something I didn't know about. Need to investigate. Thank you!

jzabroski commented 4 years ago

I think your CloudFlare configuration causes your response handler to eat itself when processing requests for the same package name with different casing. That is my hypothesis, anyway. So, you may want to fix the casing problem, then do a side by side test and see if I am right about OOM occurring due to incorrect handling of bad payloads due to CloudFlare passing you unexpected data due to casing issues.

bdgza commented 3 years ago

I am having the same OOM error with https://www.fuget.org/packages/gdal.native It is also showing package versions that have been unlisted from nuget.org, also for https://www.fuget.org/packages/gdal

jzabroski commented 3 years ago

Again, the issue is capitalization. Use https://www.fuget.org/packages/GDAL and you're good.

@praeclarum do you want to fix this?, given the open PR in #120 as well as my reproduction instructions that reproduce the issue 100% of the time.

praeclarum commented 3 years ago

Hi all! Sorry @jzabroski I’ll get to it today. Thanks for figuring it out.

praeclarum commented 3 years ago

OK I didn't get to it then, but I've been thinking about it. I closed your PR but will be stealing bits of it. Thanks for all your effort.

If you are new to this thread: Yes, FuGet sometimes fails with OOM. I am working on both refining this software and beefing up the servers. If you get the OOM, check back in an hour. Sorry for this trouble!

jzabroski commented 3 years ago

I don't think you need to beef up the server. Why are you throwing money away. Its 100% reproduce with the scenarios I gave.

praeclarum commented 3 years ago

Sorry I should have said that this is only one cause. I plan on fixing your scenario, just wanted to make it clear that this won’t fix the error completely.

jzabroski commented 3 years ago

I know, but we had a thread on Twitter about six months ago about saving for retirement, and I don't understand how you are going to monetize fuget.org, and it seems counter-productive towards the goals we discussed on Twitter.

In fact, nuget.com is now linking to fuget.org, and you don't have any ads, so I don't see upgrading your servers as a very rational approach or good prioritization of your resources. I realize this can be a tough message, but it's tough because I like what you're doing and want to see you rewarded for providing services.

Enjoy the rest of your weekend.

timkomip commented 3 years ago

@praeclarum So I was listening to the Merge Conflict podcast (great podcast 😉 ) and had an idea.

So, to recap, the issue is that the in-memory cache is filling up all and causing the instance to crash. A naive fix would be to "cache" all the items into memory or to disk (via database or whatever) however this would be turbo expensive and would scale poorly.

What if you add TTL (expiration) values to the items in cache and/or a max limit of items in cache? This would create a sort of a rolling cache that had a max size. This would keep things fast-ish, would hard limit memory usage, and prevent the app process from dying.

Also, it might be worth moving your cache into something like Redis since it is really good at the cache thing and make scaling multi instances easier.

timkomip commented 3 years ago

Another crazy thought. If you wanted to store items to disk, you could possibility "cheat" and use Azure Blob Storage as key value storage. This would be cheaper than a SQL server and be pretty fast. However, you lose some ability to do complicated queries on data without pulling some of it into memory first.

May be a dumb idea but thought I would throw it out there.

Also, thank you for Fuget and your work on it 👍

MarkPflug commented 3 years ago

I would venture a guess that these OOM issues are due to LOH fragmentation. Pretty much every nuget package is going to exceed the LOH threshold. Caching the entire package zip in a MemoryStream means the majority of the process memory is going to end up allocated in the LOH. As packages are evicted from the cache and the associated byte[] is freed from the LOH it will eventually lead to significant fragmentation, and even though there might still be a lot of free memory in the LOH, there wont be a large enough contiguous block to fit one of these absurdly bloated nuget packages. I'll admit I'm not an expert, this is just a theory.

As far as this suggestion that casing of the package name in the URL might be a cause of the issue, I'm pretty sure that a red herring and is just a caching issue. I suspect cloudflare is serving up cached pages that have OOM messages. Changing the casing of the URL will avoid getting the cached response from cloudflare.

One metric I'd be curious to know (if you know it, and don't mind sharing) is how often the application process restarts. Is it frequent?

I have an idea that might at reduce the severity/frequency of this OOM problem; will send a POC PR.

praeclarum commented 3 years ago

With #139 this seems to work now. Please open another issue if it starts misbehaving again.