volarjs / volar.js

💙🌊
https://volarjs.dev/
MIT License
988 stars 47 forks source link

Support overlapping mapping ranges #203

Closed piotrtomiak closed 3 months ago

piotrtomiak commented 4 months ago

I am continuing my work on Angular support and a lot of stuff is working correctly in version 2.3.0, however there is one problem with transpilation of ?., which for historical reasons works differently then TypeScript version of ?.. Let me describe the problem.

Problem

WebStorm is using TypeScript server to provide types of expressions. For instance, if we have foo.bar.fooBar expression, to get the type of fooBar property, WebStorm will check expression type of the whole range foo.bar.fooBar. This fails to work with transpiled ?. though. Currently, the ranges would be incorrectly mapped from the source file to generated file, e.g. data?.icon would be mapped to data)!.icon.

Visualization of Angular Template Transpilation 2024-06-17 11-36-46

The problem is that ?. is expanded to a whole expression, so we need to add more mappings and they will start overlapping:

Visualization of Angular Template Transpilation 2024-06-17 11-38-39

So, if we map source range data?.icon we should get (null as any ? (this.data)!.icon : undefined) generated range, but if we want to map an error from generated range that icon property does not exist on data, we should get a different mapping:

Visualization of Angular Template Transpilation 2024-06-17 11-39-51

Solution

I want to store source mappings in an interval tree (e.g. https://www.npmjs.com/package/@flatten-js/interval-tree). This will allow for a given text range to find all mappings containing a start point and an end point in O(ln(n)) time (where n is number of mappings) and then find among them the best match:

  1. If there are mappings containing both start and end point - find the smallest range among them. If start and end match perfectly use the target range as is. This also solves issues with mapping of e.g. _t1 to fooBar (when lengths of source and generated code are different). Currently you get _t1 mapped to foo instead of fooBar.
  2. Otherwise find the smallest mapping containing start and the smallest mapping containing end point and use them for mapping start and end separately.

@johnsoncodehk - I can implement the solution and provide a PR with it. Let me know what you think about this!

johnsoncodehk commented 4 months ago

We have a discussion in the Volar Discord server that uses zero length mappings to solve the same problem and avoid overlapping mappings: https://discord.com/channels/1192759067815464990/1252021970443440128 (Discord server invite link: https://discord.com/invite/7qfYWRrUAM)

Please let me know if this works in your case as an alternative.

piotrtomiak commented 4 months ago

@johnsoncodehk - I went through the topic, but I am afraid it won't work in my case. The problem is opposite to the one specified in the thread. I need to map from source range to generated range. The problem is that I need to find the best range, not just the best start and the best end separately.

For instance, for the source range data?.icon from screenshots above, there are two mappings for the start index («» denotes the range):

and two mappings for the end index:

And the mapping I need, is the one, which has both: start and end index, which is (null as any ? (this.data)!.icon : undefined). If we had just one level of such thing, I could probably do that through some custom data and filter on the mappings, but when you chain ?. you get some nesting, and you can have even 3 or more mappings for the same start segment (see the screenshots above). I don't see any other way to do that other than through an interval tree.

The zero-length segments are pretty cool trick for mapping highlighting errors back to the source code, but I guess if we have overlapping segments support, I won't need them. Also, the mappings will work correctly in both ways, i.e. source -> generated and generated -> source. I recall we have some issues with getting element types from Vue files, because of these zero-length segments, as we cannot map from source to generated code correctly.

Using interval tree can be costly performance wise, so they would be used only, if there are overlapping segments found in the map.

johnsoncodehk commented 4 months ago

(First I am still trying to understand this case.)

If the Angular compiled code is so complex, is it possible to replace (null as any ? ((null as any ? ((null as any ? (this.data)!.icon : undefined)!.toString : undefined))!() : undefined) with a simpler this.data?.icon?.toString() in the virtual code used for Volar to perform the mapping?

The regex could be like /\(null as any ? (?<start>.*)!(?<end>.*) : undefined\)/g.

Therefore mapping can be simplified to:

Source code: {{«data?.icon?.toString()»}} Virtual code: this.«data?.icon?.toString()»

piotrtomiak commented 4 months ago

I have a full control over what's generated, so that's not a problem, but for me important part is to be 1:1, what Angular compiler does and TBH I have no idea why it does that. There are also other cases, where foo is translated to this.foo, etc.

I am now implementing the algorithm to compare it with the existing one in terms of performance, both memory-wise and time-wise. I will not be using any external libraries for that. It seems that, because of translateOffset being O(n), which is used in findMatchingOffsets, we might get a significant improvement here, where you can retrieve offsets in O(ln(n)). The cost would be slightly higher memory usage, but still O(n) and the creation time should also be still O(n*ln(n)). One more benefit will be that the provided mappings won't need to come sorted in any way.

johnsoncodehk commented 4 months ago

A small update is that we just implement binary search for translateOffset in https://github.com/volarjs/volar.js/commit/1a92d47398f752ee207ac630d941ba14e2e4aaf4.

johnsoncodehk commented 4 months ago

I've added a test for the Angular template virtual code mapping you showed in #204, which is based on the zero-length mapping trick to handle the use case you mentioned.

I'm not sure if overlapping paragraph support is a good idea (don't get me wrong, I'm not rejecting it, I just need to understand your use case better) and I'll wait to review it once your implementation is complete.

piotrtomiak commented 4 months ago

@johnsoncodehk - thanks for your work again! I have submitted my PR with a lot of comments to explain the difference between our approaches to mappings. As far as BinarySearchStorage vs IntervalTreeStorage is concerned. It seems that interval tree will use less memory, as each mapping range is stored only once, whereas in your case it's stored in many places. I will have a look at some performance metrics

piotrtomiak commented 4 months ago

@johnsoncodehk - I've profiled the BinarySearchStorage vs the IntervalSearchStorage and the interval one is slower and takes more memory in the end, so I will remove it from my PR. I will also migrate to use one range mapping per Mapping and add some checks in the Map. I am going to update the PR soon

piotrtomiak commented 4 months ago

@johnsoncodehk - sorry for all of the noise! I've finally managed to get a final PR to cover Angular case with overlapping rang emappings and without zero-length ranges. The only major change is the range-matching logic, so that we can map correctly variable-length ranges. With this logic we get an equivalence of the zero-length ranges, which are a bit confusing and magical ;) I hope you'll accept the PR.

My main mistake was that I've put all the ranges into a single mapping. Changing that helps a lot here! I've added a check and an informative error, so that others will not repeat my mistake ;)

johnsoncodehk commented 4 months ago

Thank you for your efforts! I'll check it out, it'll take some time.

piotrtomiak commented 4 months ago

Thanks!

One more thought. If we can get rid of zero-length mappings, which is possible with my change, the Mapping interface could be simplified to:

export interface Mapping<Data = unknown> {
    sourceOffset: number;
    generatedOffset: number;
    length: number;
    generatedLength?: number;
    data: Data;
}

Right?

johnsoncodehk commented 3 months ago

One more thought. If we can get rid of zero-length mappings, which is possible with my change, the Mapping interface could be simplified to:

You are right, the current data format is a bit confusing, I will try to refactor in 2.4 as you suggested and see if that causes any limitations.