Error `needLargeMem: Out of memory - request size 65568 bytes, errno: 12`

corneliusroemer commented 1 year ago

When opening: https://genome.ucsc.edu/cgi-bin/hgPhyloPlace?db=wuhCor1&phyloPlaceTree=hgPhyloPlaceData/wuhCor1/public.plusGisaid.latest.masked.pb&subtreeSize=5000&remoteFile=https%3A%2F%2Flapis.cov-spectrum.org%2Fgisaid%2Fv1%2Fsample%2Fgisaid-epi-isl%3ForderBy%3Drandom%26limit%3D1000%26dateFrom%3D2023-06-12%26dateTo%3D2023-09-11%26variantQuery%3D%255B4-of%253AS%253A455F%252CS%253A456L%252CS%253A478R%252CS%253A403K%252CS%253A486P%252CS%253A475V%252CS%253A470N%255D%26host%3DHuman%26accessKey%3D9Cb3CqmrFnVjO3XCxQLO6gUnKPd

I get needLargeMem: Out of memory - request size 65568 bytes, errno: 12

It's around 500 sequences.

AngieHinrichs commented 1 year ago

Thanks for reporting, I'll take a look. It might take a while to fix (especially on the main site since there is a one to four week release cycle delay from the test site). In the meantime, if an option could be added to CoV-Spectrum to randomly downsample, to send UShER no more than 500 sequences (or 400 to be safe) that would help avoid the problem while still giving a good lineage overview (@chaoran-chen ?).

chaoran-chen commented 1 year ago

Sure, I reduced it to 400. Please let me know if I should increase it again.

AngieHinrichs commented 1 year ago

Wonderful, thanks @chaoran-chen, and as always, so fast! @corneliusroemer, does the query work better for you now?

corneliusroemer commented 1 year ago

Wonderful, thanks @chaoran-chen, and as always, so fast! @corneliusroemer, does the query work better for you now?

This query from above still pulls in 1000 😜 I managed to get it to work occasionally with 1000, I think, but better not overload your server :)

corneliusroemer commented 1 year ago

I just reran with limit=400 and now I got the following error (I actually remember seeing that Cannot allocate memory, can't fork before, yesterday and today):

AngieHinrichs commented 1 year ago

It's possible that our server is getting a little overloaded. I'll look into it.

chaoran-chen commented 1 year ago

@corneliusroemer, but the cov-spectrum website now generates links with limit=400, right?

corneliusroemer commented 1 year ago

Yes it does @chaoran-chen, but I still get the error needLargeMem: Out of memory - request size 65568 bytes, errno: 12

AngieHinrichs commented 1 year ago

About a week ago we had to impose some stricter limits on the total amount of memory used by all threads of the apache web server, because sometimes too many high-memory requests were hitting us at once and crashing the machine. That may be happening here. I just watched top while trying Cornelius's request and while the hgPhyloPlace process got up to ~15GB, there was a Genome Browser process that got as high as 32GB! I'm tracing through the logs to see if I can figure out what kind of usage makes a Genome Browser process so big (and relatively slow).

[Also I could be a lot smarter about how I'm handling metadata, for SARS-CoV-2 it's enormous and I really don't need to be reading it all in. I should just read in an index, maybe try sqlite?]

corneliusroemer commented 1 year ago

The failures keep happening stochastically even with covSpectrum only exporting 400 sequences now.

Seems like the overall memory is sometimes tight as when it happens, it happens to a lot of requests (I sometimes send 4 in parallel).

AngieHinrichs commented 1 year ago

Sorry but with our new restrictions on total memory use, sending four requests in parallel might be a bit much... maybe back off to 2? If you need to run lots of these, maybe I can set you up with equivalent matUtils extract commands that you can run locally on full tree files?

corneliusroemer commented 1 year ago

Sorry but with our new restrictions on total memory use, sending four requests in parallel might be a bit much... maybe back off to 2?

@AngieHinrichs If load is an issue then I can absolutely change my usage, though it comes at the cost of having to modify my established workflow. It appears that I am indeed single-handely crashing (or rather thrashing) Usher when firing off some 5 requests in short succession. I definitely don't want to DDOS Usher, so yes, I shouldn't do that anymore now that I'm aware.

If you're interested in looking to work around this here are some things that might be worth considering:

it might be the number of requests passed to Usher that cause thrashing rather the number of requests, whether large or small (just anecdotal evidence/gut feeling)
how have you implemented the new memory limit? Is this effectively rate limiting by rejecting requests when more than X jobs are running before you get into memory issues or do you rate limit only once things start getting slow?
Unless I'm the only one getting those OOM messages (logs will tell whether it's just my IP ;) ) it might be nice to wrap them so that users know that it's not a bug but that Usher is under heavy load ("Usher is currently very busy, you might want to look at matUtils extract if you want to run this yourself...")

I'm absolutely willing to figure out how to use Usher locally. I should have already done so long ago - the reason I haven't is that until now the web server was good enough. As you know, I've used Taxonium with the trees, but stopped using that when Taxonium's relative lack of features made it less effective in my view than using Nextstrain/Auspice trees via web usher (cc @theosanderson in case you'd want to have some power user feedback on things that Taxonium is missing to make it as good or better than auspice not only on very large trees but also on tree sizes that auspice can handle).

I would love to have a look at using matUtils extract with you to see whether it might be easy to get up a local equivalent of what the web server does - that could be very useful to others as well, maybe as a tutorial on how to get started with matUtils.

theosanderson commented 1 year ago

@corneliusroemer yes if you could you remind me what the highest priority Taxonium feature request(s) would be for you that would be helpful. Mutation text without hovering? (Feel free to open an issue in Taxonium repo - one issue that lists everything you want to mention would be fine).

yatisht / usher

Error `needLargeMem: Out of memory - request size 65568 bytes, errno: 12` #351