microsoft / clrmd

Microsoft.Diagnostics.Runtime is a set of APIs for introspecting processes and dumps.
MIT License
1.05k stars 255 forks source link

Attaching to a Linux process is not yet implemented #143

Closed leculver closed 5 years ago

leculver commented 6 years ago

I'm winding down development in this sprint of ClrMD work. It's looking like I won't have time to handle inspecting a live process. I'm leaving this issue open as a TODO marker and to let others know it's a known issue.

TylerAP commented 5 years ago

ran into this.

leculver commented 5 years ago

@TylerAP For now you should be able to use the createdump utility to drop a core dump, then load it.

https://github.com/dotnet/coreclr/blob/master/Documentation/botr/xplat-minidump-generation.md

TylerAP commented 5 years ago

that's exactly what I'm going to do... I just have to get createdump to work in a self-contained docker image based on runtime-deps... or give up and switch to a hosted image... ~ah no createdump there either.. nor in the sdk image... ok going digging for createdump.~ found it.

leculver commented 5 years ago

I will have another sprint of time to do some work in late December, which is when I plan to implement live process attach on Linux. At that point I'll stop calling the Linux support "experimental"...

TylerAP commented 5 years ago

FYI for anyone else that is looking for createdump in containers... It's not in microsoft\/dotnet:*-runtime-deps. In microsoft\/dotnet:*-runtime \/ *-sdk, it's in ... /usr/share/dotnet/shared/Microsoft.NETCore.App/*/createdump

TylerAP commented 5 years ago

Update: can't call createdump from inside a container, it needs access to proc/*/mem, so run --privileged, and then you get exit code 134 and a message; LTTng-UST: Error (-17) while registering tracepoint probe. Duplicate registration of tracepoint probes having the same name is not allowed. Going to try this over on coreclr...

leculver commented 5 years ago

This is a problem with some of our diagnostic tools. Overall I'd say this probably needs attention from our diagnostics team that this doesn't work out of the box (so @mikem8361 @noahfalk).

I've worked around this locally by renaming "libcoreclrtraceptprovider.so" to something like "libcoreclrtraceptprovider.s_". This makes it so that CLR and the dac can't find this tracepoint provider library and you avoid this error.

Sorry for the trouble here, I'm not sure what the status of fixing this problem is.

mikem8361 commented 5 years ago

This has been fixed for 3.0 (master branch) in https://github.com/dotnet/coreclr/pull/20874. I'm investigating getting it fixed in 2.2.

TylerAP commented 5 years ago

I'm gonna give a shot at this... forked the project and I might submit a few PRs if you don't beat me to it.

I know this is probably against your infrastructure architecture plans, so I won't submit PRs for this stuff; I got rid of the azure, arcade, shell and batch scripts, moved to generation of test fixtures (test targets) via MS Build tasks, moved some Windows natives out from amongst the platform independent natives, referring to "dbgshim" instead of "mscoree," dropping dll suffix (to allow using .so suffix on Linux) and targeting only .NET Standard 2.0 (may pick earlier version), working on using AppVeyor for CI, artifacts and a NuGet feed.

Also planning on adding docfx to generate gh-pages docs and introducing code coverage too.

I don't think you want any of those changes except maybe the MS Build tasks at current. Feel free to look at the fork.

Here's hoping for clrmd attaching on linux. 👍

leculver commented 5 years ago

Awesome!

Yeah we won't be able to accept any PR that gets rid of the arcade and other scripts. All of this goo is basically required for us to be able to produce official, signed NuGet packages.

However, if you have a working implementation of attaching to a live linux process, I'd be super happy to have that as a PR. One thing to be aware of is the JetBrains team is looking to submit a very large cleanup PR for the project. The draft of it is here: https://github.com/Microsoft/clrmd/pull/151

This will almost certainly conflict with any changes you are making. You may want to wait until that PR is wrapped up and merged, then refork. It's up to you.

TylerAP commented 5 years ago

I'll rebase on that and keep sync'd. Thanks for the heads up.

Update: Rebased. I'll sync up with any future changes to that branch.

leculver commented 5 years ago

@TylerAP Ok, Microsoft/clrmd/master is now fully up to date with all the refactoring/code changes that will be done. Sorry for any inconvenience it caused...I wanted to set a new baseline for future development. (Also thanks to Ruslan who put these changes together.)

TylerAP commented 5 years ago

I rebased earlier. I am sync'd with master. No big deal.

I am making progress. Working on generating core dumps under Linux.

Edit; See edit history for the ugly details. COMPlus_DbgMiniDumpName does not appear to work.

On the docker sdk image, there is no coredumpctl so I can't run coredumpctl list. cat /proc/sys/kernel/core_pattern returns core.

So the COMPlus_DbgMiniDumpName env var appears to be ignored except for writing a message to the screen, and I find /app/src/TestTargets/core ... Great. Who do I pass that bug along to? 🥇

Now I gotta fix the Dac locator, Could not find matching DAC for this runtime. and we should have more than 9 passing tests.

mscordaccore.so does not appear to be in the sdk docker image... But we got /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.6/libmscordaccore.so, so all good.

Gonna update DacInfo.cs friends and family. Edit: Done. DllImport still sucks, worked around. Have 9 passing tests, but xUnit aborts with no reason specified after ~5 minutes (probably super secret timeout elapsed). Maybe will sort that out later. Adding tests for attaching. Can make PR for linux functions, dll -> so, lib prefix and friends later.

Added latest Linux test results to bottom of linked issue.

TylerAP commented 5 years ago

I'm sync'd.

Like I said though, I'll part out PRs I submit back, and at some point I'm going to squash a lot of my git history and restructure it to fit your flow.

TylerAP commented 5 years ago

Looks like JetBrains is continuing to do more refactoring. Should I plan to sync up with Ruslan Isakiev's fork again in the near future? I do like the look of that .editorconfig.

isakiev commented 5 years ago

Hi Tyler! The fork is experimental, there's no need in syncing. My goal is to eventually switch to the Microsoft/clrmd/master. There are few extension points and fixes that I'd like to propose for merge in the near future, but no fundamental changes are expected.

TYoung86 commented 5 years ago

I'd hope that the windows natives could all be split out from classes that contain any non-windows (xplat/posix/linux/bsd) natives in a refactoring pass. I've done that on my branch to avoid static initialization causing DllNotFoundExceptions.

I've created an XPlatLiveDataTarget (and an XPlatDataReader, but then merged it into XPlatLiveDataTarget).

Avoiding DebugCreate and any dbgeng dependency, hopefully won't be needed. ICorDebug is the only debugging interface I've identified as exposed via dbgshim, let me know if there's a dbgeng for Linux (apart from libdbgeng.so from Wine).

I've got initial Passive attachment started. I'm currently working through getting the dac loaded under linux, and encountering 0x80131c4f (CORDBG_E_MISSING_DEBUGGER_EXPORTS, "The debuggee memory space does not have the expected debugging export table.") ~Got plenty of leads, working through it. Not sure what to do with DacDataTargetWrapper yet though. Looks like SetThreadContext is missing? Is that something I screwed up?~ Anyway, that wasn't the problem...

I might try to use CoreCLRCreateCordbObject or CreateDebuggingInterfaceFromVersion instead... let me know if I should keep with CLRDataCreateInstance.

TylerAP commented 5 years ago

Judging by coreclr/src/daccess/daccess.cpp lines 7251 to 7301 ... CORDBG_E_MISSING_DEBUGGER_EXPORTS is given when one of the calls fail in ClrDataAccess::GetDacGlobals() to the wrapped data target. So I'm going to revert any attempt to use other interfaces and keep with CLRDataCreateInstance.

TylerAP commented 5 years ago

(Sorry about re-post, was posting under alternate profiles by accident)

Ok, so it was my implementation of ReadMemory I needed to fix... I was getting EPERM from process_vm_readv, classic... Added cap_add SYS_PTRACE (CAP_SYS_PTRACE) to my docker-compose.yml for running the tests.

First attach test pass... ``` clrmd-tests_1 | Build started 12/06/2018 22:39:46. clrmd-tests_1 | Test run for /app/src/Microsoft.Diagnostics.Runtime.Tests/bin/Debug/netcoreapp2.1/Microsoft.Diagnostics.Runtime.Tests.dll(.NETCoreApp,Version=v2.1) clrmd-tests_1 | Microsoft (R) Test Execution Command Line Tool Version 15.9.0 clrmd-tests_1 | Copyright (c) Microsoft Corporation. All rights reserved. clrmd-tests_1 | clrmd-tests_1 | Starting test execution, please wait... clrmd-tests_1 | [xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.0 (64-bit .NET Core 4.6.27019.06) clrmd-tests_1 | [xUnit.net 00:00:00.56] Discovering: Microsoft.Diagnostics.Runtime.Tests clrmd-tests_1 | [xUnit.net 00:00:00.61] Discovered: Microsoft.Diagnostics.Runtime.Tests clrmd-tests_1 | [xUnit.net 00:00:00.62] Starting: Microsoft.Diagnostics.Runtime.Tests clrmd-tests_1 | [xUnit.net 00:00:00.76] Microsoft.Diagnostics.Runtime.Tests.AttachTests.InvasiveAttachTest [SKIP] clrmd-tests_1 | [xUnit.net 00:00:00.76] derp clrmd-tests_1 | [xUnit.net 00:00:01.01] Microsoft.Diagnostics.Runtime.Tests.AttachTests.NonInvasiveAttachTest [SKIP] clrmd-tests_1 | [xUnit.net 00:00:01.01] derp clrmd-tests_1 | [xUnit.net 00:00:01.02] Finished: Microsoft.Diagnostics.Runtime.Tests clrmd-tests_1 | Skipped Microsoft.Diagnostics.Runtime.Tests.AttachTests.InvasiveAttachTest clrmd-tests_1 | Passed Microsoft.Diagnostics.Runtime.Tests.AttachTests.PassiveAttachTest clrmd-tests_1 | Skipped Microsoft.Diagnostics.Runtime.Tests.AttachTests.NonInvasiveAttachTest clrmd-tests_1 | clrmd-tests_1 | Total tests: 3. Passed: 1. Failed: 0. Skipped: 2. clrmd-tests_1 | Test Run Successful. clrmd-tests_1 | Test execution time: 3.3644 Seconds clrmd-tests_1 | clrmd-tests_1 | Build succeeded. clrmd-tests_1 | 0 Warning(s) clrmd-tests_1 | 0 Error(s) clrmd-tests_1 | clrmd-tests_1 | Time Elapsed 00:00:05.13 clrmd_clrmd-tests_1 exited with code 0 ```

First attach test on linux passing. Going to get invasive and non-invasive tests written and then create a new branch from main and port minimal changes. 👍 Huzzah. 👍

TylerAP commented 5 years ago

Enumerated handles on an attached process under Linux. :godmode: This satisfies for my needs, the thread implementation will be a nice to have. Hopefully I will get some more time to spend on this after hours.

Will make a clean branch for a minimal PR tomorrow.

https://github.com/TylerAP/clrmd/issues/1#issuecomment-445091021

adamsitnik commented 5 years ago

I just wanted to say that it would be awesome to have it, it would unblock BenchmarkDotNet to make our disassembler cross-platform https://github.com/dotnet/BenchmarkDotNet/pull/1015

akrock commented 5 years ago

We have used CLR MD in the past to live attach to our own process and dump running threads to assist with identifying deadlocks, and other unexpected processing slowdowns for batch worker processes in full framework.

As we are now migrating some of our batch processes into NET CORE and linux docker containers we would be very interested in having this functionality working under linux.

loop-evgeny commented 5 years ago

Any update on this? It would be great to have!

leculver commented 5 years ago

@akrock @loop-evgeny Live data reading was added in pull https://github.com/microsoft/clrmd/pull/263. Also Mike made some fixes in pull https://github.com/microsoft/clrmd/pull/276. I have not had the chance to test it but I've been assured it works, so I will close this issue. If you find problems with it please feel free to open new issues about the specific problems.

loop-evgeny commented 5 years ago

Yes, it works now! Thanks so much!