mariusmuntean / ChartJs.Blazor

Brings Chart.js charts to Blazor
https://www.iheartblazor.com/
MIT License
691 stars 152 forks source link

Dataset rework #96

Closed Joelius300 closed 4 years ago

Joelius300 commented 4 years ago

Describe the feature request

Currently the datasets are all over the place. Some have base classes, some don't. Mixable charts use the covariant IMixableDataset<T> which allows for different types of datasets in one chart but with many downsides. I once was really proud of my implementation of those mixable datasets (when I started learning about covariance) but let's face it, I did horribly. Having to use wrappers for all value types is really bad and extending this thing seems like a nightmare. Now with the recent interop-layer rework (not #70, that one 4 month ago), we need to have an id for each dataset so now we also have an IDataset interface that ensures that we have an accessable id.

The datasets are stored in config.data.dataset. This is usually a collection (either List or HashSet) containing objects of type IMixableDataset<object> or some specific class like BubbleDataset. For charts that only support one dataset-type with one data structure, using a List along with a specific class, works fine. However, things get complicated once you're dealing with charts like the line chart. The line chart allows for different types of datasets (it's perfectly legal to have a bar dataset in a line chart) and the line chart also accepts data in different forms (array of numbers, array of number-points, array of time-points). These have to somehow be mixable without breaking typesafety and without allowing you to add other types of datasets.

Which charts does this feature request apply to?

All charts

Describe the solution you'd like

I'd like the datasets and the collection of datasets to be enjoyable to use, performant and extendable.

Enjoyable to use

Dataset
Dataset collection

Performant

That's actually not that big of a deal considering the bottleneck in this library is likely the usage of reflection and dynamic features but I have not tested it. Still it would be nice if we weren't in boxing hell like we are currently with IMixableDataset and the wrappers.

Extendable

We should split these datasets into interfaces and base classes so we can extend and modify them if needed. After all we are modeling a JavaScript library so it's definitely of value to be flexible. We don't want to get rid of typesafety (!) but being able to extend the model from outside without heavy reflection usage is great for something like this.

API proposal

I have already implemented the base system and this time I think I can actually be proud of it.
It's entirely typesafe but still flexible, raises compiler errors when you try to add a not supported dataset, has full support for structs, allows for object and collection initializers and implements IList as best as possible.

There is the base interface IDataSet. This interface has no type associated with it. It's more a semantic restriction than anything else but it does contain the most important properties being Id and Type. It's also important when storing datasets since it's the base of any dataset.
Implementing that base interface is the generic version IDataSet<T> where T can be any type. No in or out modifiers apply. This interface also assures that implementers need to expose a read-only list of T.

Then the base class for all the datasets is DataSet<T>. This is a class that implements IDataSet<T> and IList<T> for modification. It exposes the contents through a read-only property and thereby implements IDataSet<T>.Data. It also contains the AddRange methods we love.

Now the dataset collections. In many cases we can just use List<BubbleDataset> or something similar so we don't want too much abstraction.
For charts that support more than that, we have the handy class DatasetCollection. It implements IList<IDataset> so you can do modifications as you please with the exception of adding IDataset. The Add and Insert methods are implemented explicitly (and therefore can't be called unless casted) and throw a NotSupportedException if used. However, they expose the protected method AddDataset which can add any dataset. This is the only way (disregarding reflection) to add datasets to this collection.
Now we can derive from that collection and add our own Add methods. The Add method is overloaded for every supported dataset. In the case of the line chart, this means every dataset that consists of either ints, longs, doubles, Points or TimePoints. For each possibility there is an Add method. Not only is the name intuitive, it's also the key to collection initializers. They search for an overload of Add when the base implements IEnumerable (which DatasetCollection already does). We can add some abstraction to those collections like the NumberDatasetCollection and the NumberAndPointDatasetCollection but for now, there aren't interfaces like INumberDatasetCollection and IPointDatasetCollection. You'd still need to implement both so it's not really useful unless we start using composition but then things get even more complex.

And here's the code

``` public interface IDataset { string Id { get; } // for interop string Type { get; } // for mixed charts } // this is actually implemented and the dataset collections // restrict to IDataset, IDataset through their Add methods public interface IDataset : IDataset, IList { IReadOnlyList Data { get; } } ``` This is the base implementation for every dataset. ``` [JsonObject] public abstract class Dataset : Collection, IDataset { ///

/// Gets the id used on interop-level. /// public string Id { get; } /// /// Gets the data contained in this dataset. This property is read-only. /// public IReadOnlyList Data { get; } /// /// Gets the type of this dataset. Important for mixed charts. /// public string Type { get; } public Dataset(string type = null, string id = null) : base(new List()) { Data = new ReadOnlyCollection(Items); Id = id ?? Guid.NewGuid().ToString(); Type = type; } public void AddRange(IEnumerable items) => ((List)Items).AddRange(items ?? throw new ArgumentNullException(nameof(items))); public void AddRange(params T[] items) => AddRange(items as IEnumerable); public override bool Equals(object obj) => obj is Dataset set && Id == set.Id && EqualityComparer>.Default.Equals(Items, set.Items); public override int GetHashCode() => HashCode.Combine(Items, Id); public static bool operator ==(Dataset left, Dataset right) => EqualityComparer>.Default.Equals(left, right); public static bool operator !=(Dataset left, Dataset right) => !(left == right); } ``` And here is how you can store those datasets. ``` // Supports every operation of IList except for adding and inserting. // There are protected methods for implementors but only the supported IList // members are implemented implicitly. Those don't show up in code complete unless // you cast it to IList in which case they will throw a NotSupportedException. // Also since the Add methods take precedence the more concrete they are and ILists Add // is hidden, you can use the Collection Initializer with different types (check Program.cs it's AWESOME) public abstract class DatasetCollection : IReadOnlyList, IList { private const string NotSupportedMessageModificationThroughInterface = "This collection doesn't support adding datasets through the IList or ICollection interface."; private readonly List _datasets; [JsonIgnore] public int Count => _datasets.Count; [JsonIgnore] public bool IsReadOnly => false; IDataset IList.this[int index] { get => this[index]; set => ThrowNotSupported(); } public IDataset this[int index] => _datasets[index]; protected DatasetCollection() { _datasets = new List(); } public bool Contains(IDataset dataset) => _datasets.Contains(dataset ?? throw new ArgumentNullException(nameof(dataset))); public void CopyTo(IDataset[] array, int index) => _datasets.CopyTo(array, index); public IEnumerator GetEnumerator() => _datasets.GetEnumerator(); public int IndexOf(IDataset dataset) => _datasets.IndexOf(dataset ?? throw new ArgumentNullException(nameof(dataset))); protected void AddDataset(IDataset dataset) => _datasets.Add(dataset ?? throw new ArgumentNullException(nameof(dataset))); protected void InsertDataset(int index, IDataset dataset) => _datasets.Insert(index, dataset ?? throw new ArgumentNullException(nameof(dataset))); public bool Remove(IDataset dataset) => _datasets.Remove(dataset ?? throw new ArgumentNullException(nameof(dataset))); public void RemoveAt(int index) => _datasets.RemoveAt(index); public void Clear() => _datasets.Clear(); IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable)_datasets).GetEnumerator(); void IList.Insert(int index, IDataset item) => ThrowNotSupported(); void ICollection.Add(IDataset item) => ThrowNotSupported(); private void ThrowNotSupported() => throw new NotSupportedException(NotSupportedMessageModificationThroughInterface); } ```
Now for some examples, shall we?

This is the simplified dataset for a line chart. Fully generic, supports value types and has the type already assigned. If you implement a new line-like chart, you can derive from this class and use the protected constructor to inject your own type. ``` public class LineDataset : Dataset { public LineDataset() : this("line") { } protected LineDataset(string type) : base(type) { } public int[] BorderDash { get; set; } public int? PointBorderWidth { get; set; } public int? PointHoverRadius { get; set; } public bool? Fill { get; set; } public double? LineTension { get; set; } public bool? SpanGaps { get; set; } } ``` In order to store all the datasets a line chart supports, we also need a `DatasetCollection` with the appropriate `Add` methods. We could just write those all in one but it makes more sense to get a certain degree of abstraction which helps implementing other charts such as the bar chart. ``` public class NumberDatasetCollection : DatasetCollection { public void Add(IDataset dataset) => AddDataset(dataset); public void Add(IDataset dataset) => AddDataset(dataset); public void Add(IDataset dataset) => AddDataset(dataset); } ``` ``` public class NumberPointDatasetCollection : NumberDatasetCollection { public void Add(IDataset dataset) => AddDataset(dataset); public void Add(IDataset> dataset) => AddDataset(dataset); public void Add(IDataset> dataset) => AddDataset(dataset); public void Add(IDataset> dataset) => AddDataset(dataset); } ``` What's missing is the `LineData` class. It just contains the correct dataset collection and the labels. The labels are only serialized when they contain data but they're still get-only. ``` public class LineData { public List Labels { get; } = new List(); [JsonProperty("xLabels")] public List XLabels { get; } = new List(); [JsonProperty("yLabels")] public List YLabels { get; } = new List(); // Supported: https://www.chartjs.org/docs/latest/charts/line.html#data-structure public NumberPointDatasetCollection Datasets { get; } = new NumberPointDatasetCollection(); [Obsolete("json.net", true)] public bool ShouldSerializeLabels() => Labels.Count > 0; [Obsolete("json.net", true)] public bool ShouldSerializeXLabels() => XLabels.Count > 0; [Obsolete("json.net", true)] public bool ShouldSerializeYLabels() => YLabels.Count > 0; } ``` The bar example is similar but a bit more specific. ``` public class BarDataset : Dataset { public BarDataset(bool horizontal = false) : this(horizontal ? "horizontalBar" : "bar") { } protected BarDataset(string type) : base(type) { } public double? BarPercentage { get; set; } public double? CategoryPercentage { get; set; } } ``` ``` public class BarDatasetCollection : NumberPointDatasetCollection { public void Add(IDataset dataset) => AddDataset(dataset); } ``` ``` [JsonConverter(typeof(FloatingBarPointConverter))] public readonly struct FloatingBarPoint : IEquatable { public readonly double Start, End; public FloatingBarPoint(double start, double end) { Start = start; End = end; } public override bool Equals(object obj) => obj is FloatingBarPoint point && Equals(point); public bool Equals(FloatingBarPoint other) => Start == other.Start && End == other.End; public override int GetHashCode() => HashCode.Combine(Start, End); public static bool operator ==(FloatingBarPoint left, FloatingBarPoint right) => left.Equals(right); public static bool operator !=(FloatingBarPoint left, FloatingBarPoint right) => !(left == right); } internal class FloatingBarPointConverter : JsonConverter { public override FloatingBarPoint ReadJson(JsonReader reader, Type objectType, FloatingBarPoint existingValue, bool hasExistingValue, JsonSerializer serializer) { //todo throw new NotImplementedException(); } public override void WriteJson(JsonWriter writer, FloatingBarPoint value, JsonSerializer serializer) { writer.WriteStartArray(); writer.WriteValue(value.Start); writer.WriteValue(value.End); writer.WriteEndArray(); } } ``` ``` public class BarData { public List Labels { get; } = new List(); // Supported: https://www.chartjs.org/docs/latest/charts/bar.html#data-structure public BarDatasetCollection Datasets { get; } = new BarDatasetCollection(); [Obsolete("json.net", true)] public bool ShouldSerializeLabels() => Labels.Count > 0; } ``` Another example is for the bubble chart. This one is less specific (both spectrums are well supported). ``` public class BubbleDataset : Dataset { public BubbleDataset() : this("bubble") { } protected BubbleDataset(string type) : base(type) { } public int? Rotation { get; set; } public int? Radius { get; set; } } ``` ``` public readonly struct BubblePoint : IEquatable { public readonly double X, Y, R; public BubblePoint(double x, double y, double r) { X = x; Y = y; R = r; } public override bool Equals(object obj) => obj is BubblePoint point && Equals(point); public bool Equals(BubblePoint other) => X == other.X && Y == other.Y && R == other.R; public override int GetHashCode() => HashCode.Combine(X, Y, R); public static bool operator ==(BubblePoint left, BubblePoint right) => left.Equals(right); public static bool operator !=(BubblePoint left, BubblePoint right) => !(left == right); } ``` ``` public class BubbleData { public IList Datasets { get; } = new List(); } ``` And here's all the rest I have/you need. ``` public readonly struct Point : IEquatable { public readonly double X, Y; public Point(double x, double y) { X = x; Y = y; } public override bool Equals(object obj) => obj is Point point && Equals(point); public bool Equals(Point other) => X == other.X && Y == other.Y; public override int GetHashCode() => HashCode.Combine(X, Y); public static bool operator ==(Point left, Point right) => left.Equals(right); public static bool operator !=(Point left, Point right) => !(left == right); } ``` ``` public readonly struct TimePoint : IEquatable> { [JsonProperty("t")] public readonly DateTime Time; public readonly T Y; public TimePoint(DateTime time, T y) { Time = time; Y = y; } public override bool Equals(object obj) => obj is TimePoint point && Equals(point); public bool Equals(TimePoint other) => Time == other.Time && EqualityComparer.Default.Equals(Y, other.Y); public override int GetHashCode() => HashCode.Combine(Time, Y); public static bool operator ==(TimePoint left, TimePoint right) => left.Equals(right); public static bool operator !=(TimePoint left, TimePoint right) => !(left == right); } ``` ``` // This contract resolver is necessary because `Collection` (the base of) `Dataset`, has a non-virtual `Count` property // this property always gets serialized and can't be influenced by `JsonIgnoreAttribute` or `ShouldSerializeCount`. // This contract resolver is currently the only way I konw to get rid of that `Count` property and also to keep the possibility // of having a `Count` property in a dataset as an options (which should get serialized, but currently there's not even a use-case for that). internal class IgnoreDatasetCountContractResolver : DefaultContractResolver { protected override IList CreateProperties(Type type, MemberSerialization memberSerialization) { IList baseProps = base.CreateProperties(type, memberSerialization); if (typeof(IDataset).IsAssignableFrom(type)) { string countName = nameof(ICollection.Count); if (NamingStrategy != null) { countName = NamingStrategy.GetPropertyName(countName, false); } foreach (var prop in baseProps) { if (prop.PropertyName == countName && prop.DeclaringType.IsGenericType && prop.DeclaringType.GetGenericTypeDefinition() == typeof(Collection<>)) { prop.Ignored = true; break; } } } return baseProps; } } ``` ``` class Program { static void Main(string[] args) { foreach (object sampleData in GetSampleData()) { string serialized = JsonConvert.SerializeObject(sampleData, Formatting.Indented, JsonSerializerSettings); Console.WriteLine(serialized + Environment.NewLine); } Console.ReadLine(); } private static IEnumerable GetSampleData() { { BubbleDataset bubbleDs1 = new BubbleDataset { Radius = 5 }; bubbleDs1.AddRange(new BubblePoint(1, 2, 3), new BubblePoint(1, 10, 100), new BubblePoint(-100, double.MinValue, double.MaxValue)); BubbleDataset bubbleDs2 = new BubbleDataset { Rotation = 10 }; bubbleDs2.AddRange(new BubblePoint(123.456, 1.2, 3.4), new BubblePoint(-1.2, 2.3, -3.4), new BubblePoint(1 / 999, double.MinValue, double.MaxValue)); BubbleData bubbleData = new BubbleData { Datasets = { bubbleDs1, bubbleDs2, // new BarDataset() // compilation error, only BubbleDataset } }; yield return bubbleData; } { LineDataset ls1 = new LineDataset { Fill = true }; ls1.AddRange(1, 2, 3, 4, 5); LineDataset ls2 = new LineDataset { 1.1, 2.2, 3.3, 1 / 999, double.MinValue, double.MaxValue }; LineData lineData = new LineData { XLabels = { "a", "b", "c" }, Datasets = { ls1, ls2, new LineDataset { new Point(10.5, 7), new Point(-99.999, 13.2), new Point(1, -1) }, new BarDataset { 1.13254, double.MinValue, double.MaxValue, 2.572, 37.753, 999 / 5 }, // new LineDataset() // compilation error, no compatible Add overload } }; yield return lineData; } { BarDataset ls1 = new BarDataset { BarPercentage = 20 }; ls1.AddRange(Enumerable.Range(2, 5).Select(i => i * i * i)); BarDataset ls2 = new BarDataset { new Point(-99456.95993, 13346.2), new Point(1560.5, 63465), new Point(12587.5, 733465), new Point(145.5, 7245), new Point(154, -14) }; BarData barData = new BarData { Labels = { "a", "b", "c", "d", "e" }, Datasets = { ls1, ls2, new BarDataset { new FloatingBarPoint(3, 7), new FloatingBarPoint(-1, 10), new FloatingBarPoint(100, -100), new FloatingBarPoint(1 / 123456789, 1 / 987654321), new FloatingBarPoint(double.MinValue, double.MaxValue) }, new LineDataset(), // allowed, it's an IDataset // new BarDataset() // compilation error, no compatible Add overload } }; yield return barData; } } private class CustomObject { public int MyInt { get; set; } public string SomeString { get; set; } } private static readonly JsonSerializerSettings JsonSerializerSettings = new JsonSerializerSettings { NullValueHandling = NullValueHandling.Ignore, ContractResolver = new IgnoreDatasetCountContractResolver { NamingStrategy = new CamelCaseNamingStrategy(true, false) } }; } ```

Additional context

  • When this is done, we can also finally implement the Point and TimeTuple and many other classes as structs as seen here. They might need some tweaking but are already improved a lot over the current classes.
  • We'll need to look at the supported data-structures for all the charts in order to design the dataset collections.
  • It's another breaking change.. should we start batching these? Maybe for release 2.0 we could do #70 and #95? And then for 3.0 we could do #78, this and more Nullable<T> (for the defaults, issue incoming). We'll have to see.
  • Excuse my wording and other mistakes, it's quite a big issue and I'm quite tired.

Final words

Although I already thought this issue through quite a bit, I'm not sure on all the specifics yet. Please tell me anything you see wrong with this, potential pitfalls, missing features, etc.
I'd love to hear feedback on this!

mariusmuntean commented 4 years ago

This looks quite promising to me but also complex . At this moment I can't foresee all the implications this change will have (except the obvious breaking changes).

Do you have a branch were you're working? I'm interested to see how easy/complex it is to port some of the samples. I'd look into a simple one and a complex one. This has the advantage that it uncovers flaws or weaknesses pretty immediately. It also gives you the library user's perspective.

Another suggestion I have would be to support the library user as much as possible, but never restrict them. Inevitably we'll restrict the user so it'd be helpful to provide an "escape hatch", by that I mean a way of circumventing our abstractions to allow for unforeseen usages.

Joelius300 commented 4 years ago

For the end user it shouldn't be complex at all (that was one of my biggest requirements for this rework). They can use the datasets like a normal List<T> with some additional properties. That's why it implements IList<T> along with AddRange-methods.
For the dataset collections it's the same thing. They can just call the Add methods or use the collection initializer like they would with any other List<T>. The only difference being that there are multiple overloads for the Add method so they can add different types than just T of List<T>. Also all the other methods for modifying the dataset collection don't provide all those overloads because there isn't much value in that. It could however be done easily by applying the same technique to those methods that we used for the Add methods.

I don't know if I have time to push it as a branch today but in the meantime you should check out the last code piece in the issue. It's the Program.cs of my demo application and shows the use case. If you want, I can upload my test project here (just as zip), so you don't have to copy the code over.

For the last point, I also think it's important to keep it flexible (that was another big requirement). This solution is very flexible because you can subclass any datset to add properties, you can implement your own datasets with Dataset<T> or IDataset<T> and you can also implement your own dataset collections by subclassing the existing ones like NumberDatasetCollection, subclassing DatasetCollection or creating your own complex dataset collection from the ground up. The only thing that could be a bit more flexible is that the dataset collections are get-only properties in the data-classes. If you want to use your own dataset collection in a bar chart, you'll have to subclass BarData and override the Datasets property with your own type. I think this way it's better. Adding a setter there doesn't really make sense (unless we abstract all possible dataset collections into interfaces) and you can customize almost everything about this through subclassing.

As a final note, I know it's unfortunate to have so many breaking changes but I really do believe they're necessary. Especially the datasets are a mess right now but there are other things that need reworking. I would suggest that we start working on release branches where we batch breaking changes. Only once multiple breaking changes are done, we can rebase onto master to get all the bug fixes and small features that were done there in the meantime and then merge back into master. It takes much more merge effort and requires all the developers to keep track of where we currently stand but it'll make actual releases much more reasonable and allows us to release minor versions when small features and bug fixes are done without conflicting with breaking changes. This would also be a good time to tidy up our GitHub project and possibly start using milestones to mark which features belong in which release. That way we can plan major releases beforehand and work on our big features in an organized manner.

Joelius300 commented 4 years ago

Also something to consider, regarding the simplicity and extensibility:

I think most users use our library for fast and easy access to creating charts from their data. Although we 100% need to make sure our library can be easily extended, it should be our focus to simplify the process of creating a basic chart. Currently using the wrappers and the covariant datasets is much more complicated than it needs to be. A normal end user shouldn't be exposed to these kinds of concepts if they're not needed. Instead, it should feel like they're working with familiar classes like List<T>.
This is what this rework aims to achieve by having a simple and familiar interface with more complex but still extendable classes in the background.

MindSwipe commented 4 years ago

Looks very promising. I agree, most of the users of this Package are looking for a easy to use package to present their data, and as such I agree that simplifying the usage of the API is very important. But we shouldn't simplify the API itself, it should still be as powerful as possible, but easier to use.

Also, I think the finished API should never throw a NotSupportedException, everything we export should be supported and clearly documented, and only in dire situations should the API throw an exception.

Furthermore, I think we should group all breaking changes together and then release a version 2, this way anyone using our package needs to only rewrite their code interfacing with our package once, and not multiple times. And while were at it, we should also help users upgrading their projects to the new version with before and after examples as well as a well documented list of all breaking changes, where they can view what changed, how it changed, and how they have to update their code

I'll copy your code snippets you have here to get a further understanding for your mock implementations and probably come back with some feedback.

Joelius300 commented 4 years ago

Thanks for the feedback.

I'm not sure how to understand your first paragraph. Could you elaborate on what you mean by that?

.. I agree that simplifying the usage of the API is very important. But we shouldn't simplify the API itself, it should still be as powerful as possible, but easier to use.

Regarding the second paragraph, I'd have to disagree. The way I want to implement the API on the dataset collections (the datasets themselfs don't throw) is modeled after how C# arrays handle it.
They implement IList<T> but implement all the modification members like Add, Insert and Remove explicitly and make them throw a NotSupportedException. This way you can use the collection in methods that expect an IList<T> or ICollection<T> without having to create a wrapper around it or copying the data.
The only difference to C# arrays is that the actual implementation to the interfaces are provided at runtime whereas we're required to already have them at compile time (but as far as I understand that doesn't really make a difference in our case).
Here's the paragraph I'm referring to from the Array class API documentation:

Single-dimensional arrays implement the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, System.Collections.Generic.IEnumerable<T>, System.Collections.Generic.IReadOnlyList<T> and System.Collections.Generic.IReadOnlyCollection<T> generic interfaces. The implementations are provided to arrays at run time, and as a result, the generic interfaces do not appear in the declaration syntax for the Array class. In addition, there are no reference topics for interface members that are accessible only by casting an array to the generic interface type (explicit interface implementations). The key thing to be aware of when you cast an array to one of these interfaces is that members which add, insert, or remove elements throw NotSupportedException.

You could argue that it's not necessary to use the dataset collections in such APIs and I'd agree but in what way does it hurt implement the interface explicitly?

Totally agree on the third paragraph. I don't know if it makes sense to put all the breaking changes that are currently planned (this, #70, #95, #78) in a single release, I'll have to think about that a bit. I think it might be better but it'll be a lot of effort so there won't really be updates until that's released. Also it will probably result in a lot of merging efforts especially if there are new, smaller features added in master while we work on release 2.0. But I think it wouldn't be easier if we decided to split it into multiple releases.