Closed jzabroski closed 3 years ago
I believe the issue is due to capitalization in the package name, but can't fathom why capitalization would trigger an OutOfMemoryException.
Compare the following two links:
In addition, it seems fuget.org landing page caches all urls I hit, regardless if they're valid or not, which causes the following landing page screenshot:
@praeclarum If you want to show me how to debug this, I'd be interested in helping fix it and contributing to this awesome project. However, the stack trace I posted isn't enough to help me figure out what URL is actually being used so I'm not sure how to repro.
@praeclarum I think this issue could be due to your CloudFlare configuration. Just a guess.
Thanks for the info!
Yeah the OOM is an old bug I haven't been able to squash.
But this casing problem is something I didn't know about. Need to investigate. Thank you!
I think your CloudFlare configuration causes your response handler to eat itself when processing requests for the same package name with different casing. That is my hypothesis, anyway. So, you may want to fix the casing problem, then do a side by side test and see if I am right about OOM occurring due to incorrect handling of bad payloads due to CloudFlare passing you unexpected data due to casing issues.
I am having the same OOM error with https://www.fuget.org/packages/gdal.native It is also showing package versions that have been unlisted from nuget.org, also for https://www.fuget.org/packages/gdal
Again, the issue is capitalization. Use https://www.fuget.org/packages/GDAL and you're good.
@praeclarum do you want to fix this?, given the open PR in #120 as well as my reproduction instructions that reproduce the issue 100% of the time.
Hi all! Sorry @jzabroski I’ll get to it today. Thanks for figuring it out.
OK I didn't get to it then, but I've been thinking about it. I closed your PR but will be stealing bits of it. Thanks for all your effort.
If you are new to this thread: Yes, FuGet sometimes fails with OOM. I am working on both refining this software and beefing up the servers. If you get the OOM, check back in an hour. Sorry for this trouble!
I don't think you need to beef up the server. Why are you throwing money away. Its 100% reproduce with the scenarios I gave.
Sorry I should have said that this is only one cause. I plan on fixing your scenario, just wanted to make it clear that this won’t fix the error completely.
I know, but we had a thread on Twitter about six months ago about saving for retirement, and I don't understand how you are going to monetize fuget.org, and it seems counter-productive towards the goals we discussed on Twitter.
In fact, nuget.com is now linking to fuget.org, and you don't have any ads, so I don't see upgrading your servers as a very rational approach or good prioritization of your resources. I realize this can be a tough message, but it's tough because I like what you're doing and want to see you rewarded for providing services.
Enjoy the rest of your weekend.
@praeclarum So I was listening to the Merge Conflict podcast (great podcast 😉 ) and had an idea.
So, to recap, the issue is that the in-memory cache is filling up all and causing the instance to crash. A naive fix would be to "cache" all the items into memory or to disk (via database or whatever) however this would be turbo expensive and would scale poorly.
What if you add TTL (expiration) values to the items in cache and/or a max limit of items in cache? This would create a sort of a rolling cache that had a max size. This would keep things fast-ish, would hard limit memory usage, and prevent the app process from dying.
Also, it might be worth moving your cache into something like Redis since it is really good at the cache thing and make scaling multi instances easier.
Another crazy thought. If you wanted to store items to disk, you could possibility "cheat" and use Azure Blob Storage as key value storage. This would be cheaper than a SQL server and be pretty fast. However, you lose some ability to do complicated queries on data without pulling some of it into memory first.
May be a dumb idea but thought I would throw it out there.
Also, thank you for Fuget and your work on it 👍
I would venture a guess that these OOM issues are due to LOH fragmentation. Pretty much every nuget package is going to exceed the LOH threshold. Caching the entire package zip in a MemoryStream means the majority of the process memory is going to end up allocated in the LOH. As packages are evicted from the cache and the associated byte[] is freed from the LOH it will eventually lead to significant fragmentation, and even though there might still be a lot of free memory in the LOH, there wont be a large enough contiguous block to fit one of these absurdly bloated nuget packages. I'll admit I'm not an expert, this is just a theory.
As far as this suggestion that casing of the package name in the URL might be a cause of the issue, I'm pretty sure that a red herring and is just a caching issue. I suspect cloudflare is serving up cached pages that have OOM messages. Changing the casing of the URL will avoid getting the cached response from cloudflare.
One metric I'd be curious to know (if you know it, and don't mind sharing) is how often the application process restarts. Is it frequent?
I have an idea that might at reduce the severity/frequency of this OOM problem; will send a POC PR.
With #139 this seems to work now. Please open another issue if it starts misbehaving again.