qwertie / ecsharp

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.
http://ecsharp.net
Other
177 stars 25 forks source link

Support for special LongSparseAList #53

Closed MkazemAkhgary closed 6 years ago

MkazemAkhgary commented 7 years ago

SparseAList is very solid collection, And I use it for mapping lots of sequential data into different integer places. but sadly my main requirement is 64bit integers.

It seems extending SparseAList to support 64bit indexer is not possible. since its based on AList and whole thing has to be changed to support 64bit indexing. @qwertie told me that this has additional 50% overhead in size.

However in my case benefit of SparseAList overweighs additional overhead of 64bit version. I would like to have this collection, since collection complexity is at monster level (it seems 😃), I'm looking for approach to minimize (or completely avoid) possible mistakes by manually copying every bit.

as suggested by @qwertie this can be achieved by LeMP, I would like to do to this.

qwertie commented 7 years ago

Here's the plan I was thinking over at StackOverflow:

  1. Fork ecsharp repo.
  2. Create a new library, say, Loyc.AList64.dll, that references Loyc.Essentials but not Loyc.Collections.
  3. Use LeMP to generate 64-bit versions of the sparse list interfaces in Loyc.Essentials.dll (the new interfaces will still be defined in Loyc.Essentials.dll, not Loyc.AList64.dll, because I treat Loyc.Essentials.dll as a central location for defining interfaces in my libraries)
  4. Use LeMP to generate 64-bit versions of SparseAList, its base classes, and its node classes.

To get the ball rolling I added an alist64 branch to this repository, and did step 3. In that branch you'll see I've modified the original Collections/Interfaces/ISparseList.cs by adding using index = System.Int32; and then replacing all occurrences of int with index. Then I added SparseList64.ecs which generates a 64-bit version like this:

// Insert the contents of the file into the inner replace command (if you 
// write includeFile("ISparseList.cs") directly inside replace, it doesn't 
// work because the replacements happen before includeFile runs.
replacePP(file => includeFile("ISparseList.cs"))
{
    // Bug in replace: replace({ using index = System.Int32; } => { using index = System.Int64; }) doesn't work.
    // Workaround:     replace(      { index = System.Int32; } => { index = System.Int64; }) happens to work.
    replace(
        { index = System.Int32; } => { index = System.Int64; },
        ISparseListSource<T>  => ISparseListSource64<T>,
        ISparseList<T>        => ISparseList64<T>,
        ISparseListEx<T>      => ISparseListEx64<T>,
        // there is currently no 64-bit version of IList<T>, IListSource<T>, IListAndListSource<T>, or IListEx<T>
        IListSource<T>        => IReadOnlyList64<T>,
        IListAndListSource<T> => IReadOnlyList64<T>,
        IListEx<T>            => IReadOnlyList64<T>)
    {
        file;
    }
}

LeMP is based on a modified version of C# syntax called Enhanced C#. LeMP is a preprocessor, not a complete compiler.

Admittedly, I immediately encountered some kind of bug or limitation in LeMP as described in the comment. Since LeMP has few users, you might find that something you want to do has never been done before and therefore there could be a bug in it. Just let me know if it gives you any trouble and I'll find a fix and/or workaround.

Since you'll be using replace a lot, I want to point out that Enhanced C# does not allow "custom" syntax - it's just a "liberalized" or "generalized" syntax of normal C#. In particular, the expressions ISparseList<T> => ISparseList64<T> and { using index = System.Int32; } => { using index = System.Int64; } in the above code are both parsed as "normal" lambda expressions. Since => is always a lambda operator, you may notice that a command like replace (a + b => c + d) doesn't work. The reason is that the Enhanced C# parser understands it as replace (a + (b => c + d)), not replace ((a + b) => (c+ d)) like you intended. So you would have to write it as replace ((a + b) => c + d).

By the way do you prefer the name LongSparseAList? I was thinking SparseAList64.

Oh, and are you using Visual Studio 2017? Due to lack of demand I haven't actually upgraded LeMP to integrate with VS2017 yet.

MkazemAkhgary commented 7 years ago

Sorry for very late response. I was busy with some project and I didn't want to work on this distractedly.

So I started to fork ecsharp repo. then I created Loyc.Collections64 library. unfortunately I'm stuck at installing your VS tool.

I've upgraded into VS17, so the extension doesn't work for me, but fortunately you had standalone version LeMP.exe 😃 I couldn't find version 2.5 though, I have version 2.3

I had to explicitly type path like this since they were not in same folder.

../../Core\\Loyc.Essentials\\Collections\\Interfaces\\ISparseList.cs

image 8

I'm not sure whats this error, can you guide me on how to solve this? thank you. (i copied same syntax you just give to me)

SparseAList64 is perfect. Yes, I highly suggest Vs17, apart from massive features it got, the new c#7 syntax is flawless Imho.

(Since FCU is coming out and .Net standard 2 is released with all features that were missing in Net framework 4.5, I think its time to move forward. It would be really nice if you could push your project forward, currently its not compatible with UWP projects because of .net thing)

qwertie commented 7 years ago

I can confirm that v2.3.1 in Lib/LeMP has the error you screenshotted. v2.5 which I have installed doesn't have any errors. So I have pushed v2.6.1 to master and merged it into alist64 branch. I will also create a 2.6 release today and then attempt to get LeMP working properly in VS2017.

qwertie commented 7 years ago

I was able to get the VS2017 extension to work - please get LeMP_VisualStudio.vsix from the Release page.

MkazemAkhgary commented 7 years ago

Thanks, the new extension works great, now I see how its supposed to work. but how did you manage to replace int by index directive in cs file? did you Replace simply by within VS editor? or is there some functionality in LeMP that I'm not aware of?

MkazemAkhgary commented 7 years ago

SparseAList has a bug at NextLowerItem and NextHigherItem

if you intend to pass int.Max or int.Min value to get items from scratch, these methods may not find any item, due to integer overflow happening in calculations.

I guess two checks should be added into these methods to make sure they work always. simple fix

if (index != null)
    index = Math.Max(Math.Min(index.Value, source.Count - 1), 0);
qwertie commented 7 years ago

Oh, for that I just used editor find-and-replace and decided on a case-by-case basis whether int should be index.

On my local machine added a couple of unit tests to Core\Tests\Collections\ALists\SparseAListTests.cs confirming the bugs in NextLowerItem and NextHigherItem.

MkazemAkhgary commented 6 years ago

Hi, is there any way to random access to actual items? for example index 5 should return 5th actual item in sparse list, currently I have to do this by finding next nearest item until I reach 5th item. so for large number of items this becomes quite inefficient.

qwertie commented 6 years ago

Oh, that's an interesting point (and sorry for taking so long to respond). IIRC the B+tree actually does not contain the necessary information to locate the Nth item! Because the "virtual" indexes are used instead of the real indexes in the internal nodes.

qwertie commented 6 years ago

Why do you need to look up the Nth real item?

MkazemAkhgary commented 6 years ago

Thanks for the reply.

I see. that makes sense. I don't really need it yet, but it could be useful in some cases. this requires two collections to be updated in sync together (a sparse list with normal list). but this approach currently has more drawbacks than advantages, but if one day I feel need for it then I may take this road.

I'm also making an observable list that can know if its items are changed. each item has NotifyPropertyChanged event, simply list will listen to this item changes.

In order to know the index of item that has been changed it requires a dictionary from item to list of indexes (Dictionary<T, List<int>>). that already requires Sparse list to be updated in sync with dictionary.

having that said, third collection underneath for accessing Nth item is just overkill for now.

qwertie commented 6 years ago

You're saying you use a Dictionary<T, List<int>> to keep track of the locations of items? That seems to defeat the purpose of using SparseAList<T> in the first place. The reason to use SparseAList<T> instead of SortedDictionary<int, T> or BDictionary<int, T> is that a sparse list allows you to add or remove items in the middle, which "shifts" the indices of all items with higher indices. A dictionary does not have that ability.

MkazemAkhgary commented 6 years ago

I see, I'm doing some sort of caching for the main list. so there are small lists created from the main list to provide fast access to specific items.

small list should be also sparse list, so it can be updated in sync with main sparse list.

you are right, I shouldn't be using dictionary because they are not good match.

It turns out, i don't have to know at what index the item is changed, since a single item is added to both main list and cached one, and they have same reference (items are reference type) so they are inherently in sync.

qwertie commented 6 years ago

Closing due to inactivity.