packdat / SourcetrailDotnetIndexer

Creates Sourcetrail-databases from .net assemblies
33 stars 13 forks source link

This tool has now an offspring that parses C# code to generate Sourcetrail-databases.
Check out SourcetrailCSharpIndexer !

What

SourcetrailDotnetIndexer is a command-line tool that takes a .NET assembly as input and generates a database that can be loaded with Sourcetrail.
It uses reflection to collect all types from the assembly and also parses the IL-Code of methods.

It uses SourcetrailDB for writing the database.
For convenience, the native DLL for SourcetrailDB is already included, so you don't have to build it yourself.
Note, the native DLL is a x64 DLL so in your project settings, you have to specify x64 as the target platform as well.

There are actually two variants of the same tool:
SourcetrailDotnetIndexer for the "classic" .NET Framework (2.x to 4.x)
SourcetrailDotnetCoreIndexer for the "new" .net core and .net5+

Why

This project arose from an attempt to visualize the class-relationships for a large legacy .NET Application.
After previous attempts with Graphviz and similar free tools (like Gephi and yEd) failed (albeit powerful in their domain, the results were not what i had in mind), i found Sourcetrail and it was exactly what i needed.

Disclaimer

This tool does not parse source-code, it just takes a compiled assembly.
This means, there will be no source-code integration available in Sourcetrail.
On the other hand, this means it should work with all kinds of .NET assemblies, regardless of the language they are created from (e.g. C#, VB.NET, ...) but i tested in only with a handful of assemblies generated from C#.
Feel free to open a pull-request if you encounter any flaws.

In version 0.2.0, support for reading some source-code information has been added.
SourcetrailDotnetIndexer attempts to read information from PDB-files if they were found next to the assembly. (e.g. if the path to your assembly is c:\temp\myApp.exe SourcetrailDotnetIndexer attempts to load the PDB c:\temp\myApp.pdb)
This enables the tool to read the source-formation about where a type or a method is referenced in a method.
(See the second screenshot below)
If you need more control over where to locate the PDB-files, please open an issue (or better yet: open a pull-request ! :) and describe your scenario.

Building

Open the .sln in VisualStudio and build.
Make sure, you set the target-platform to x64 ! (because of the native SourcetrailDB.dll)

Usage

The following command-line arguments are supported:

Note
If you encounter exceptions when running the tool stating Unable to load DLL 'SourcetrailDB', your system may be missing the Visual C++ Runtime required by the native SourcetrailDB dll.
In that case, install the runtime for Visual Studio 2019 from this link.
(make sure to install the x64 version)


Building a multi-assembly database

Note: If you don't need the fine-grained control described here, you may want to skip to the easy way.

It is a rare case that you have a single executable/dll for your application.
Most of the time, you have additional libraries that are used (e.g. referenced) by your application.
Since v0.2.4 SourcetrailDotnetIndexer is able to follow these references and include them in the generated database as well.
To accomplish this, two new command-line switches were introduced:

Consider this fairly simple VisualStudio solution:
DemoSolution
It contains a .net5 console application and 3 class libraries (in netstandard 2.0 format)
The namespaces of these projects match their project-name (e.g. classes in Lib2 have the namespace "Lib2").
If you would index the ConsoleDemo project, you would just see a single class in Sourcetrail:
SingleClass
To index the assemblies referenced by ConsoleDemo as well, we utilize the -fn switch.
The command line would look something like this:
SourcetrailDotnetCoreIndexer.exe -i ConsoleDemo.dll -o . -f ^System -fn ^Lib\d
Note the -fn switch, here we utilize the fact that the namespaces have a common format (Lib1 to Lib3) so we use the regex-pattern \d that matches a single digit (a character in the range '0' to '9').
As an alternative, we could have specified the namespaces explicitly to generate the same output:
-fn ^Lib1 -fn ^Lib2 -fn ^Lib3

Note: If the same pattern is specified for the -f AND the -fn switch, the -f switch takes precedence.

If your application references more than a handful of assemblies and the namespaces of the referenced assemblies do not share a pattern that could simply be translated into a regex-pattern, it may be more convenient to put all these namespace-names into a single file and reference just that file when running the indexer.
That is the purpose of the -ff switch.
Sticking with the previous example, we create a text-file with the following content:

# this is a comment
^Lib1
^Lib2
^Lib3

The file contains all the namespaces, we want to include (one per line).
When reading the file, SourcetrailDotnetIndexer skips empty lines and lines starting with a '#'-character, so you could even put comments into the file.
We save this file as refAssemblies.txt and specify it as an argument for the -ff switch:
SourcetrailDotnetCoreIndexer.exe -i ConsoleDemo.dll -o . -f ^System -ff refAssemblies.txt

Opening the generated database in Sourcetrail now also shows classes from the referenced assemblies:
MultipleClassed
And displaying the caller-graph for the single method in Lib3 shows a trail trough all our assemblies:
MultipleClassed


The easy way

If you just have a bunch of assemblies and you don't know or don't care which one to use for the initial one (specified with the -i switch), you can make your life even easier by using the -if switch.
With this switch (available since v0.2.5), you specify a text-file that contains the paths to all the assemblies you want to include in the Sourcetrail-database.

To demonstrate this, we again use the previously mentioned example.
To keep things simple, we copy all assemblies of our solution into one folder.
Then we create a text-file containing the assembly-paths to our assemblies in the same folder.
We could create the file by hand, but we could also let a simple shell-command do this for us.
Open a command-prompt or PowerShell and navigate to the folder, where our assemblies are stored.
Execute one of these commands:

This creates a text-file with all assembly-names for us.
The file should look like this:

ConsoleDemo.dll
Lib1.dll
Lib2.dll
Lib3.dll
# comments are supported, this line was added by hand (e.g. to exclude nuget-packages)

The assembly-names in this file can be specified either by full path, or by a path that is relative to the text-file. (as shown here)
Because we now specify all assemblies explicitly, the switches -fn and/or -ff are no longer needed.
On the other hand, we now have to specify the -of switch, as we want to create a single database for all assemblies.

The new command-line would look something like this, producing the same result in Sourcetrail as the previous approaches:
SourcetrailDotnetCoreIndexer.exe -if asmNames.txt -of demo.srctrldb -f ^System

(Note, instead of using the -if switch, you can specify the -i switch multiple times, but i think using a single text-file is move convenient)

Results

This is an image taken from sourcetrail after creating the database for SourcetrailDotnetIndexer itself. Screenshot

This is a screenshot with source-information read from a PDB. PdbSupport