Closed filipnavara closed 1 year ago
Thanks! Results look great, definitely agree that the current implementation of ElfStringTable
is super inefficient.
Will have a look at the PR later tonight.
Looks like the CI was stuck because it was requesting ubuntu-18.04
but it has been removed from GitHub action.
I have switched to ubuntu-20.04
but I expect tests to fail as they were relying on a particular version of readelf.
Also, when running locally, I got multiple errors from the DwarfTests, and they seem to be related to the string table changes. For example:
System.IndexOutOfRangeException : Index was outside the bounds of the array.
at LibObjectFile.ObjectFileStreamExtensions.ReadStringUTF8NullTerminated(Stream stream) in C:\code\LibObjectFile\src\LibObjectFile\ObjectFileStreamExtensions.cs:line 71
at LibObjectFile.Dwarf.DwarfStringTable.GetStringFromOffset(UInt64 offset) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfStringTable.cs:line 43
at LibObjectFile.Dwarf.DwarfAttribute.ReadAttributeFormRawValue(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfAttribute.cs:line 435
at LibObjectFile.Dwarf.DwarfAttribute.Read(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfAttribute.cs:line 283
at LibObjectFile.Dwarf.DwarfObject`1.ReadInternal(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfObject.cs:line 79
at LibObjectFile.Dwarf.DwarfDIE.Read(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfDIE.cs:line 415
at LibObjectFile.Dwarf.DwarfObject`1.ReadInternal(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfObject.cs:line 79
at LibObjectFile.Dwarf.DwarfDIE.ReadInstance(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfDIE.cs:line 457
at LibObjectFile.Dwarf.DwarfUnit.Read(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfUnit.cs:line 151
at LibObjectFile.Dwarf.DwarfObject`1.ReadInternal(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfObject.cs:line 79
at LibObjectFile.Dwarf.DwarfUnit.ReadInstance(DwarfReader reader, UInt64& offsetEndOfUnit) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfUnit.cs:line 206
at LibObjectFile.Dwarf.DwarfInfoSection.Read(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfInfoSection.cs:line 49
at LibObjectFile.Dwarf.DwarfObject`1.ReadInternal(DwarfReader reader) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfObject.cs:line 79
at LibObjectFile.Dwarf.DwarfFile.Read(DwarfReaderContext readerContext) in C:\code\LibObjectFile\src\LibObjectFile\Dwarf\DwarfFile.cs:line 421
at LibObjectFile.Tests.Dwarf.DwarfTests.TestDebugInfoSmall() in C:\code\LibObjectFile\src\LibObjectFile.Tests\Dwarf\DwarfTests.cs:line 328
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
That particular error is fixed with https://github.com/xoofx/LibObjectFile/pull/27.
I also had trouble running the tests locally for the reasons you mentioned.
That particular error is fixed with https://github.com/xoofx/LibObjectFile/pull/27.
Oh, good catch, just merged!
I'm starting to ask myself If I should not make this library only compatible with net7.0.
Would it be a problem for your experiment?
I'm starting to ask myself If I should not make this library only compatible with net7.0.
Would it be a problem for your experiment?
Not at all. I am fine with that. I think I publish my Mach-O libraries as net6.0+ since that's still supported LTS release but I don't mind a higher version either.
Ok, just bumped on master to net7.0.
TestHelloWorld
is failing. As expected there are new stuffs in ubuntu-20.04
generated by the gcc version that LibObjectFile is not able to decode.
Might ignore it for a time and add an issue. Don't have time to investigate this one and it might take some time to upgrade.
TestHelloWorld
is failing. As expected there are new stuffs inubuntu-20.04
generated by the gcc version that LibObjectFile is not able to decode.
I may get to look into it, but this week is exceptionally busy, so I cannot promise any timeline.
The current implementation of
ElfStringTable
doesn't scale well in terms of time and memory usage. This was discovered in the experiment which uses LibObjectFile as ELF emitting backend for the .NET NativeAOT. Link to the discussion thread.The root cause of the issue is the optimization that tries to deduplicate strings that share the same suffix. This optimization creates enormous amount of strings on the managed heap, creates a dictionary with large amount of collisions, and produces only marginally better output for the common cases. It doesn't produce the optimal solution either since it depends on the order of the strings.
I created a benchmark to see how bad the problem is and what can be done to improve the situation. The benchmark implements two strategies (
MultiKeySort
andNaive
) in addition to the currentElfStringTable
implementation used for reference. TheMultiKeySort
algorithm produces the optimal solution by pre-sorting the input, theNaive
algorithm does no suffix deduplication.The
MultiKeySort
algorithm produces an output that is 23% smaller than theNaive
implementation for the sample data. Notably,MultiKeySort
is more than 50 times faster than the current implementation and uses 65 times less memory in the process.ElfStringTable
offers an API that can build the string table incrementally, but the most common case is when it's populated from the symbol table. I present a solution where the symbol table case uses theMultiKeySort
algorithm by introducing a newElfStringTable.ReserveString
method and then builds the optimal solution lazily when the string table is emitted or indexed. The existingGetOrCreateIndex
method is modified to use the naive implementation and removes the suffix optimization.