yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
120 stars 40 forks source link

Error `needLargeMem: Out of memory - request size 65568 bytes, errno: 12` #351

Open corneliusroemer opened 9 months ago

corneliusroemer commented 9 months ago

When opening: https://genome.ucsc.edu/cgi-bin/hgPhyloPlace?db=wuhCor1&phyloPlaceTree=hgPhyloPlaceData/wuhCor1/public.plusGisaid.latest.masked.pb&subtreeSize=5000&remoteFile=https%3A%2F%2Flapis.cov-spectrum.org%2Fgisaid%2Fv1%2Fsample%2Fgisaid-epi-isl%3ForderBy%3Drandom%26limit%3D1000%26dateFrom%3D2023-06-12%26dateTo%3D2023-09-11%26variantQuery%3D%255B4-of%253AS%253A455F%252CS%253A456L%252CS%253A478R%252CS%253A403K%252CS%253A486P%252CS%253A475V%252CS%253A470N%255D%26host%3DHuman%26accessKey%3D9Cb3CqmrFnVjO3XCxQLO6gUnKPd

I get needLargeMem: Out of memory - request size 65568 bytes, errno: 12

image

It's around 500 sequences.

AngieHinrichs commented 9 months ago

Thanks for reporting, I'll take a look. It might take a while to fix (especially on the main site since there is a one to four week release cycle delay from the test site). In the meantime, if an option could be added to CoV-Spectrum to randomly downsample, to send UShER no more than 500 sequences (or 400 to be safe) that would help avoid the problem while still giving a good lineage overview (@chaoran-chen ?).

chaoran-chen commented 9 months ago

Sure, I reduced it to 400. Please let me know if I should increase it again.

AngieHinrichs commented 9 months ago

Wonderful, thanks @chaoran-chen, and as always, so fast! @corneliusroemer, does the query work better for you now?

corneliusroemer commented 9 months ago

Wonderful, thanks @chaoran-chen, and as always, so fast! @corneliusroemer, does the query work better for you now?

This query from above still pulls in 1000 😜 I managed to get it to work occasionally with 1000, I think, but better not overload your server :)

corneliusroemer commented 9 months ago

I just reran with limit=400 and now I got the following error (I actually remember seeing that Cannot allocate memory, can't fork before, yesterday and today):

image
AngieHinrichs commented 9 months ago

It's possible that our server is getting a little overloaded. I'll look into it.

chaoran-chen commented 9 months ago

@corneliusroemer, but the cov-spectrum website now generates links with limit=400, right?

corneliusroemer commented 9 months ago

Yes it does @chaoran-chen, but I still get the error needLargeMem: Out of memory - request size 65568 bytes, errno: 12

AngieHinrichs commented 9 months ago

About a week ago we had to impose some stricter limits on the total amount of memory used by all threads of the apache web server, because sometimes too many high-memory requests were hitting us at once and crashing the machine. That may be happening here. I just watched top while trying Cornelius's request and while the hgPhyloPlace process got up to ~15GB, there was a Genome Browser process that got as high as 32GB! I'm tracing through the logs to see if I can figure out what kind of usage makes a Genome Browser process so big (and relatively slow).

[Also I could be a lot smarter about how I'm handling metadata, for SARS-CoV-2 it's enormous and I really don't need to be reading it all in. I should just read in an index, maybe try sqlite?]

corneliusroemer commented 9 months ago

The failures keep happening stochastically even with covSpectrum only exporting 400 sequences now.

Seems like the overall memory is sometimes tight as when it happens, it happens to a lot of requests (I sometimes send 4 in parallel).

AngieHinrichs commented 9 months ago

Sorry but with our new restrictions on total memory use, sending four requests in parallel might be a bit much... maybe back off to 2? If you need to run lots of these, maybe I can set you up with equivalent matUtils extract commands that you can run locally on full tree files?

corneliusroemer commented 9 months ago

Sorry but with our new restrictions on total memory use, sending four requests in parallel might be a bit much... maybe back off to 2?

@AngieHinrichs If load is an issue then I can absolutely change my usage, though it comes at the cost of having to modify my established workflow. It appears that I am indeed single-handely crashing (or rather thrashing) Usher when firing off some 5 requests in short succession. I definitely don't want to DDOS Usher, so yes, I shouldn't do that anymore now that I'm aware.

If you're interested in looking to work around this here are some things that might be worth considering:

I'm absolutely willing to figure out how to use Usher locally. I should have already done so long ago - the reason I haven't is that until now the web server was good enough. As you know, I've used Taxonium with the trees, but stopped using that when Taxonium's relative lack of features made it less effective in my view than using Nextstrain/Auspice trees via web usher (cc @theosanderson in case you'd want to have some power user feedback on things that Taxonium is missing to make it as good or better than auspice not only on very large trees but also on tree sizes that auspice can handle).

I would love to have a look at using matUtils extract with you to see whether it might be easy to get up a local equivalent of what the web server does - that could be very useful to others as well, maybe as a tutorial on how to get started with matUtils.

theosanderson commented 9 months ago

@corneliusroemer yes if you could you remind me what the highest priority Taxonium feature request(s) would be for you that would be helpful. Mutation text without hovering? (Feel free to open an issue in Taxonium repo - one issue that lists everything you want to mention would be fine).