Closed ktf closed 2 months ago
Yes, users would need to addEmptyRange where they now do resize(). I guess I could actually hide addEmptyRange inside the resize. Is it allowed to use std::pmr::vector in the llvm codebase?
I don't think so. I could not find uses of it in clang.
The source locations offset would be a major source of improvement if this technique flies there.
I couldn't find the source locations vector anymore. Could you point it to me?
From here you will need to jump to the clang::SourceManager
object.
@vgvassilev I have updated the PR to include a similar patch for the SourceManager and at least the trivial test seems to work fine, including a nice 9MB reduction in memory allocations. I am now testing with the ALICE analysis framework.
I have also done a few of the cleanups, and it now only exposes an "expand" interface (basically resize without shrinking). AFAICT, it's not worth to implement the full "resize" functionality, given it's complicated and at least the places I fixed never shrink.
I think also the calloc approach might not be a good idea given realloc does not guarantee that the memory is zeroed and besides that page ranges might be a tad too big for vector of pointers and so on. I would put aside that idea, at least for now.
My current plan:
For the record, the new top offender is actually findOrCreateModule.
For the record, the new top offender is actually findOrCreateModule.
That's probably the loading of module std.pcm
...
For the record, I have done what I outlined above and updated the PR. I do see some drastic improvements for some of our workflows (250MB out of 1GB) where we have many processes initialising the interpreter. For others, where the usage of ROOT is limited to reading files in a single process, the improvement is not so obvious. Simply opening a file does show improvement as well (I am down to 49MB, when I also patch the FileInfo vector in the HeaderSearch).
The PR somehow seems to die with some old memory corruption which I am pretty sure I fixed and I cannot reproduce anymore. Is there any need to clean some cache?
Notice I have also submitted the patch to llvm itself and it passes their CI (https://github.com/llvm/llvm-project/pull/66430#issuecomment-1720164026)
The PR somehow seems to die with some old memory corruption which I am pretty sure I fixed and I cannot reproduce anymore. Is there any need to clean some cache?
I cannot reproduce it either, but a number of Linux builds agree that the assertion can still be triggered. PRs should be built from a clean state (in Jenkins), but even if not you are not changing the on-disk format so incremental builds must also work correctly. This needs investigation.
Notice I have also submitted the patch to llvm itself and it passes their CI (llvm/llvm-project#66430 (comment))
Very likely the pre-merge CI does not test a large scale module setup. You will have to test this yourself and prove that the changes don't regress the case when modules are used (almost) completely. This will be needed to convince the wider community that it is solving an actual problem, or at least not regressing the "normal" compiler use case.
I cannot reproduce it either, but a number of Linux builds agree that the assertion can still be triggered. PRs should be built from a clean state (in Jenkins), but even if not you are not changing the on-disk format so incremental builds must also work correctly. This needs investigation.
What I think is happening is that the broken PR generated some broken file, and the new PR is not able to survive such broken file. IMHO, this is a problem with the build setup (not being able to notice broken files and delete them) more than with my PR (which of course has no dependency on some intermediate state). Shall I just open a new PR and we see how that goes?
Very likely the pre-merge CI does not test a large scale module setup. You will have to test this yourself and prove that the changes don't regress the case when modules are used (almost) completely. This will be needed to convince the wider community that it is solving an actual problem, or at least not regressing the "normal" compiler use case.
I'll do what it takes, thank you for pointing that out.
I cannot reproduce it either, but a number of Linux builds agree that the assertion can still be triggered. PRs should be built from a clean state (in Jenkins), but even if not you are not changing the on-disk format so incremental builds must also work correctly. This needs investigation.
What I think is happening is that the broken PR generated some broken file, and the new PR is not able to survive such broken file. IMHO, this is a problem with the build setup (not being able to notice broken files and delete them) more than with my PR (which of course has no dependency on some intermediate state). Shall I just open a new PR and we see how that goes?
I'm telling that this is not the case - there are no left-behind files (in Jenkins) that influence future PR runs. We know this from changing the on-disk format / upgrading LLVM / etc. There is a problem and it needs to be investigated.
I think quite some improvements are there in 6.30.04 (released) and master. @ktf is it difficult for you to check if the issue is fixed, and, if yes, close the item? If not I can start from your repro and proceed.
Indeed the patch I provided cuts half of the overhead, however there is still 40 MB per process I cannot really justify at the moment. In our case that translates to 4 GB of RSS. While I appreciate that being completely lazy in the PCM loading is probably complicated, maybe some tactical solution could be employed (like it was done for the PagedVector). For example I am not convinced ReadSLocEntry needs to keep around the buffer. There is moreover a few more places where the PagedVector could be used effectively, I will try to propose a separate PR for that.
The reproducer is as easy as opening a ROOT file, see the main issue. A new profile is:
I am wondering if we are not pushing repetitive pragma diagnostic mappings. We have tons for pragrams coming from the Linkdef files, not sure if they are needed in the module file itself...
In theory, if the software stack (not just ROOT) uses modules the RSS should also go down. Maybe that's worth trying on your end...
I am wondering if we are not pushing repetitive pragma diagnostic mappings. We have tons for pragrams coming from the Linkdef files, not sure if they are needed in the module file itself...
any way i can confirm this?
I am wondering if we are not pushing repetitive pragma diagnostic mappings. We have tons for pragrams coming from the Linkdef files, not sure if they are needed in the module file itself...
any way i can confirm this?
Not directly. Either we can run rootcling with -v4
and count the mappings or we can insert some printing statements on ::WritePragmaDiagnosticMapping
... I suspect making the pragma diagnostic mappings lazy would be art...
Dear @ktf, I am closing the issue assuming that this is not an issue any more for ALICE when adopting ROOT 6.32.X. Please do not hesitate to re-open in case this is not the case, providing the context necessary for us to fix the problem.
It's not an issue for adoption. The underlying issue is still there, though. ROOT still loads a bunch of unneeded PCM when simply opening a file, it's just the cost is half what it was before and the reproducer is the same as above.
Check duplicate issues.
Describe the bug
When doing a simple
TFile::Open()
, ROOT initialises the interpreter on the firstTEnv::GetValue
needed by the aforementioned method. This makes it load all the pcms which it can find, regardless of weather they are needed to read the file or not (in our case we only have branches with basic types, e.g. floats or ints or vector of thereof). This has the net result of adding 70MB of RSS for no particular reason to any process which uses ROOT. You can find more details in the attached instruments screenshot. Moreover, single stepping with the debugger and printing out the LoadModule argument, it seems that the pcms for Sofie, Proof and 3D graphics are particularly honours, all of which we do not need and we do not link.What is the expected behaviour?
The best would be for ROOT to be smart enough to lazily load the pcms when they are actually needed, not upfront. A second best option would be to be able to start the interpreter in a clean slate state and load by hand only the pcms which are needed to read a file.
How to reproduce?
On macOS using Instruments one can do:
and get the report in question. Using the alice environment should not matter, but in principle you should be able to load it from CVMFS.
ROOT version
6.28.04, but it's a while we see it.
How did you install ROOT?
ALICE build environment
Which operating system are you using?
macOS, linux
Additional context
No response