Suggestion: Custom Syntax Highlighting for User Code vs Non-User Code

wmjordan / Codist

A visual studio extension which enhances syntax highlighting, quick info (tooltip), navigation bar, scrollbar, display quality, and brings smart tool bar with code refactoring to code editor.

https://marketplace.visualstudio.com/items?itemName=wmj.Codist

GNU General Public License v3.0

315 stars 29 forks source link

Suggestion: Custom Syntax Highlighting for User Code vs Non-User Code #41

Closed fitdev closed 6 years ago

fitdev commented 6 years ago

I really like how the Super Quick Info tooltip displays what assembly a particular member or type comes from.

That gave me an idea that may be you can add a new kind of syntax highlighting style that would apply to either all user code identifiers (types and members) and/or all non-user code identifiers. I am not sure yet what the best way to approach this will be in terms of the actual style possibilities, however, for a start, if you could just expose those 2 categories for syntax highlighting that would be great!

And one more smallish suggestion: it would be nice if you could add another category for #region and #end region directives (for the whole line). Right now I am using separate extension to highlight those.

PS Love your latest edition that lets you preview styles directly in the list! Very handy!

wmjordan commented 6 years ago

Do you mean that "user code identifiers" are identifiers that could be found in the source code, and "non-user code identifiers" are those imported from GAC and external assemblies?

If so, it is possible to do so. However, I am afraid that it will slow down the syntax highlighter considerably. What kinds of special syntax highlights will you use to distinguish between "user code identifiers" and "non-user code identifiers"? Could you please share more thoughts?

I understood your request for the #region highlighter. I previously used similar extensions before, but dropped that for quite some time. This request will be classified as a backlog since there is already another extension which has implemented it. The principle of Codist is always--innovative features go first :)

fitdev commented 6 years ago

Thank you for such a quick reply!

Do you mean that "user code identifiers" are identifiers that could be found in the source code, and "non-user code identifiers" are those imported from GAC and external assemblies?

Precisely! That's exactly what I mean.

However, I am afraid that it will slow down the syntax highlighter considerably.

That's too bad. I hope there can be a work-around. Is that because the call to determine the assembly that contains the identifier is expensive? Maybe there is a way to cache that info to speed things up? If I remember correctly last time I tried to modify one of the existing extensions, there was a symbol API that had a bool property that would say if the symbol in question was coming from user code or not. Maybe this would be faster?

What kinds of special syntax highlights will you use to distinguish between "user code identifiers" and "non-user code identifiers"? Could you please share more thoughts?

That's the tricky part. I have not figured that out myself yet. Most likely, either font-size alterations and/or slight background color tweaks. Or maybe render those in a slightly different font (say slightly bolder variant for the user-code).

Just to provide some additional background, I am working on developing a "BCL" of sorts that would serve as a foundation for all other projects, and as such it has many, many hundreds of types and members, thousands of extension methods. And for instance, when working with linq, it is difficult to be able to identify quickly whether the extension method you are calling is defined in the real BCL or nuget package you are referencing, or if it is defined in your codebase somewhere (this is especially true if the method name is the same, and the signature is just slightly different). Of course the current Quick Info tooltip is already very helpful, but that still requires mousing-over. So, it would be awesome if you could just look at the code and instantly see that, "oh yes, this is defined in the real BCL because it is highlighted slightly differently".

Thanks for considering region highlights. Your approach makes perfect sense to prioritize the development of the features not found elsewhere!

wmjordan commented 6 years ago

To determine whether a symbol is from the source code or from an assembly is to examine the Locations property or the DeclaringSyntaxReference property, both of which return a collection and we have to iterate through the collection to find out whether a location is from the source code.

Caching symbols from referenced assemblies with a HashSet might speed things up a little bit, but we have to listen to quite some more events, including project loaded or unloaded, references added or removed, compiler configuration switches (from .NET 2.0 to .NET Standard or .NET Core, etc.).

The most critical part is still how the different style can be. The rendered result could look too complicated. Is it possible to copy your code to Microsoft Word and mimic the effects by manually applying styles there?

fitdev commented 6 years ago

@wmjordan Oh, that does sound very complicated!

But then, how do you do it for the Super Quick Info? My idea was that basically you just need to get a hold of the containing assembly, and from then on it should be easy to see if the symbol's assembly is external or represents the current project (i.e. user code). I realize though that this approach may be sub optimal, as it will not work for non-shared multi-project solutions (i.e. most solutions), where one project referenced another non-shared project.

Although, on a second thought, couldn't you maintain an in memory HashSet of Project-Assembly pairs (you construct one one solution load for example, and then update it either by listening to events, or on document tab change, or something similar). Then you know that say AssemblyA is in fact a loaded user code project ProjectA, and AssemblyB that ProjectA references is in fact also a loaded user code project ProjectB. From there you know which assemblies correspond to user code and which don't. Then during syntax highlight part you basically examine the ISymbol.ContainingAssembly to see if its assembly belongs to the user code or not. That should be pretty fast.

I should clarify that local variables should be excluded from this syntax highlighting, since by definition they are always user code, though, of course, their type might not be. So, the highlighting should be applied only to member identifiers, type names (classes, interfaces, structs, enums), and extension methods.

I will try to come up with a sample style for this shortly. But I think the customizability that you already offer is quite good to allow the user to create his own style for either user or non-user code (or both). I would probably adjust the font size slightly (make non-user code a bit smaller).

As for the use cases, extension methods and inheritance overrides (in multi-class hierarchies, where you don't know if you are calling method from your base class or from the framework's base class for example) are probably the 2 biggest ones.

But even type names could benefit from that as well. This is especially true for type with common names like Point, Path, Tree, etc. - many different libraries may have the types named like this, including your own user code. Of course they all will serve different purposes (Path may be the one from System.IO or it could be from the graph-related context), but when you quickly need to understand what the code does, being able to tell at a glance if a particular type is part of the user code or is external is a huge advantage!

wmjordan commented 6 years ago

Things are much simpler in Super Quick Info since it occurs less frequent than the syntax highlighter.

Yes, caching assemblies instead of symbols may make the cache much smaller. We can use a Dictionary<IAssemblySymbol, > for this. We can just purge the cache when the solution is unloaded. Since few people keep loading and unloading a lot of assemblies or projects, the memory leak from unloaded assemblies should be reasonably ignored.

Well, the syntax highlighter are getting complicated. I don't know whether VS could probably deal with so many overlapped styles.

fitdev commented 6 years ago

Actually, not that I do it often, but within a single VS session, I would occasionally reload the same solution (different projects) 3 or 4 times. Of course, you are the one who knows how much that can translate into memory usage. But why not just purge the cache on solution unload?

I am not familiar with the syntax highlighting API, but my understanding was that you could basically have either Dictionary<IAssemblySymbol, bool> or 2 HashSet<IAssemblySymbol>. And then for every suitable symbol encountered, you would check:

            var asmSymbol = symbol.ContainingAssembly;
            if (AsmDic.TryGetValue(asmSymbol, out var isUserCode))
            {
                if (isUserCode)
                {
                    //user code highlighting
                } else
                {
                    //non-user code highlighting
                }
            } else { AsmDic.Add(asmSymbol, IsAssemblyUserCodeRoutine(asmSymbol)); }

As for the syntax highlighting, I got another idea, though it may be more difficult to implement... Ideally, the user code / non-user code distinction could be made by making either the user code or non-user code (according to options) either slightly lighter/darker or more/less saturated, while keeping all other styles the same. In other words, it would work kind of like how opacity works for code that is redundant. Except it would affect colors, since we obviously cannot use opacity for that). That way there would still be consistent styling across all symbol kinds (i.e. extension methods styled one way, properties styled another way, - regardless whether or not they are user code or non-user code - all styled according to their syntax highlighting settings for their respective kind).

Another possibility might be to play with over/under lines or other "adorners", maybe even inlined small icons for certain cases. Though I have no idea how easy / hard it would be to implement and what the performance penalty, if any, there will be. It's just that I have seen other extensions add those kinds of things to the editor (for different reasons of course), but so the possibility to style it like that is there.

wmjordan commented 6 years ago

Simply changing the bold, italic, underline, font, opacity of user symbol or non-user symbol is implemented. Please try this new beta. Codist.zip

Lightening, darkening, saturating will require multiplying scores of syntax types (user class, user interface, user struct, user member, etc. and non-user class, non-user interface, non-user struct, non-user member, etc.). I think it is too much work for me.

fitdev commented 6 years ago

Wow! Thanks! You are blazing fast!

Just tried it and it looks really good! I agree completely with you on the potential difficulty of implementing color adjustments along with the possible performance hit those may bring. The only reason why I suggested those was because I wanted the user/non-user code style distinction to be a modifier that would preserve all other styles (i.e. so that extension methods for example would still look like extension methods regardless of where they were defined). And your current implementation addresses this concern, so I guess there is no need for these advanced color manipulations now.

I ended up choosing Fira Code Light for the non-user code, while the rest is just Fira Code Retina. The results look really good, as there is just enough of a noticeable difference for you to tell where the code is coming from, without it being in your face...

2018-09-30 11_03_04-workspace net - microsoft visual studio

The only smallish omission that I noticed right now is that the generic constraints (type names after where clause) are not affected. That is they are all rendered the same regardless of whether they are user code or not.

fitdev commented 6 years ago

Although this screenshot does not really show the case where this new features is useful, it nevertheless illustrates the new feature very well. I really like how now it is super easy to see which ones are your own types / members (bolder in this example), and which ones are defined externally...

2018-09-30 11_51_07-workspace net - microsoft visual studio

fitdev commented 6 years ago

Sorry to post it here, but it seems that it's a tiny bug, and since it's about syntax highlighting I thought I would mention it here...

2018-09-30 12_20_14-workspace net - microsoft visual studio

I just noticed different formatting depending on whether the attribute is on the same line as the member vs when they are on different lines. Not sure why it would ever be formatted differently.

wmjordan commented 6 years ago

It seems that it does help someone who is new to your project. I immediately saw that BinaryRelation was a class written by you!

BTE, I think that the method parameters in bold makes the code look too bold. Maybe an italic style is enough.

Anyway, thanks for posting the bug reports. Please try this beta.

Codist.zip

fitdev commented 6 years ago

Thank you for the new version! It indeed fixes the 2 small bugs I reported in my previous posts. Excellent work!

It seems that it does help someone who is new to your project.

Not just someone, it helps myself already, and certainly will even more so in a few months from now, when I will have forgotten about specialized classes / methods I wrote some time ago. So, I think this is a really useful feature, and really appreciate your willingness to incorporate it into your extension as well as the speed with which you did so! Great work!

I completely agree with you that bold method parameters don't look good. The issue I had was that I made local variables italic as well, and so needed a way to distinguish between parameters and local variables. I made local variables italic, in turn, to distinguish them from fields and/or properties. I still have not figured out the best style combination for this.

This is what I have settled on for now:

2018-09-30 17_34_09-workspace net - stringmatcher cs

Thanks to your latest additions like brace colorization, and now this user vs non-user code distinction, the code can really look great and very easy to follow!

As a side note, it would be great if the styles could be extended with a few more visual attributes. What those attributes should be - color-related, font-related, or some icon/geometric adorners - it's hard to say. But the slight problem is that currently the existing choices for style customization - while plentiful (and really big thanks for that) - are still not enough to address "all desired" symbol types.

What I mean is that if you use a particular style property (like color, or boldness, or font size) to distinguish between particular kinds of symbols, then you run out of those properties, because there are a bit more ways to classify symbols. So, for example, in my case:

Font thickness (to distinguish it from boldness - just font variation really) => user vs non-user code
Foreground color => varying usage, but limited so as not to conflict with established standard colors (i.e. teal for types, blue for keywords, etc.)
Italics => parameters / local variables vs fields / properties
Underline => Extension methods vs non-extension methods
Bold => Static vs Instance (not really using that at the moment, but to illustrate the point)

Given the setup above, it's not clear how to deal with Abstract/Virtual/Override members for example. Clearly they kind of form a nice group of their own. Of course, it's possible to re-use existing style properties, like boldness for example, but that would make it "conflict" with how that style property is used already (for static differentiation in my example above).

It seems that this group in effect "deserves" their own style property, but we ran out of those (--strikethrough-- - hardly an option for any symbol (though still nice that it's there), and all the others are already taken by other "X vs Y" pairs). So, if there would be an additional property style we could alter, it would help... But I do understand it's kind of hard to come up with something that will look good and also not have a performance penalty. But perhaps, in time you can add something more to the styles. I do like your latest addition of gradients for the background, but have not tried it yet, as it seems to be rather specific - after all usually background is just uniform throughout the whole code, so if used, it would be used only in certain rare places.

wmjordan commented 6 years ago

Thank you very much for commenting. I'd not read them yet. I'd like to post a CAUTION for you.

I'd forgotten the fact that each time we modify the document--even if we just type a single character, Roslyn will produce a copy of the semantic model! Therefore, our approach above by caching the IAssemblySymbol wouldn't work efficiently since it would lead to severe memory leak after editing the document for sometime, caching thousands' copies of IAssemblySymbol for our source code!

I am working on this problem and I will return to you later.

wmjordan commented 6 years ago

After some investigation, I found a new way to determine whether an assembly is from source code or from metadata. No need to cache, no memory leak, little memory footprint, and much faster than dictionary lookup.

As a bonus, it is even possible (not implemented in this beta though) to distinguish symbols which is from another project in the solution or the one that will be compiled from the current project.

Please test this new beta.

Codist.zip

fitdev commented 6 years ago

Therefore, our approach above by caching the IAssemblySymbol wouldn't work efficiently since it would lead to severe memory leak after editing the document for sometime, caching thousands' copies of IAssemblySymbol for our source code!

I have used the older version for some time and have not yet noticed any performance degradation. But then, half a day is not really enough to tell, plus I did not really measure memory usage, so thank you for spotting this! Though it does seem very strange that every key stroke leads to the creation of 1000s of new ISymbols, that's seems like a waste of resources, but then I don't really know Roslyn, so I am simply surprised to learn that that's how it works. It's great that you found an alternative solution! I am giving it a try right now, and will let you know how it goes.

As a bonus, it is even possible (not implemented in this beta though) to distinguish symbols which is from another project in the solution or the one that will be compiled from the current project.

That's really nice! Does it take into account Shared Projects (because those, while technically being separate projects under VS, will be merged with the main assembly upon compilation)? It sounds like a useful addition, though I am not sure what would be the best way to surface it via syntax highlighting.

wmjordan commented 6 years ago

I did not test on Shared Projects. Actually we can have three sets of symbols now: current project, referenced project with source code and referenced assemblies. I have not thought of a way to distinguish the second set yet.

fitdev commented 6 years ago

I have not thought of how the third set of symbols can / should be surfaced either. But at least it's nice that there is such a possibility now!

I have tested the latest beta, and it seems to be working great! I will let you know if I discover any issues. Thank you for your excellent work!

wmjordan commented 6 years ago

I'd been working with this feature for quite some time. I use regular style for my symbols and bold style for external symbols. It really looks nice. I like this feature too. Thanks for sharing your idea! I am gonna close this issue now. Please comment if you find any problem.