Open phatcher opened 5 years ago
Hi Paul, and thank you for new idea about how to improve serialization performance. I agree we need to find a way to customize serialization to improve the speed offered by ODataLib converters, but while the easiest way is to offer extension methods for IBoundClient, I am not sure it's the best one. My problem with this approach is that IBoundClient already has a huge number of methods, just FindEntriesAsync has 8 (!) overloads, and most of other methods have also mutlipe overloads. Adding just a couple of new extension methods makes such extension inconsistent with the rest of the interface. And what about overloads for UpdateEntriesAsync, DeleteEntriesAsync? What about OData action and function counterparts?
I believe we need to explore two possible directions (there may be more):
These are just some rough thoughts, I haven't investigated if they fit current code base.
I'm in favor of Option 1. Beyond cases where NewtonSoft.Json or Manatee.JSON could be used, it would also enable plugging in the newer UTF-8 optimized JSON Reader from Microsoft.
Me too. Now it's just to figure out how to do it :-)
One thing I notice on IBoundClient
/// <summary>
/// Retrieves an entry by executing OData GET request.
/// </summary>
/// <returns>The first of the entries found.</returns>
Task<T> FindEntryAsync();
/// <summary>
/// Retrieves an entry by executing OData GET request.
/// </summary>
/// <param name="cancellationToken">The cancellation token.</param>
/// <returns>The first of the entries found.</returns>
Task<T> FindEntryAsync(CancellationToken cancellationToken);
whereas we could achieve the same by making cancellationToken an optional parameter with a default e.g.
/// <summary>
/// Retrieves an entry by executing OData GET request.
/// </summary>
/// <param name="cancellationToken">The cancellation token.</param>
/// <returns>The first of the entries found.</returns>
Task<T> FindEntryAsync(CancellationToken cancellationToken = default(CancellationToken));
Now whilst that is technically a breaking change to the API, I don't think any client would notice since we are not strongly named. This works as the cancellationToken is the last parameter on all the API calls and this design seems to be how most Microsoft APIs introduce CancellationToken support. I did a quick check and all the tests pass with no code change.
More generally can we look at the API as a whole and work out which bits should be marked as Obsolete so we can retire them in v6
Yes, this is a possibility, but it will require major version change because replacing two method calls with one makes the library binary incompatible with older clients, i.e. every client that used a version without cancellation token will break unless recompiled.
Ah, I'd forgotten about the binary compatibility - but who updates to a new version of a library without recompiling?
@phatcher we do in-place upgrades if we've already shipped the main product and are addressing a third party bug with a point release in the third party software. (side note: our main installations are all air gapped, so not easy to do updates of any size)
@phatcher we must respect versioning rules. And it's cheaper to bump a version than to break somebody's code.
@sixlettervariables Ah ok, I've come across that sort of installation issue before but forgotten about it.
@object I've got a branch on my fork which tidies up the interface, do we want to create a v6 branch to do the experimental API stuff on?
@phatcher yes I think it would be useful. But maybe do it first not as v6 but new branch for each new feature (interface revision can be a separate feature branch). Then it will be easier to decide which of them go into v6.
@object So the changes to simplify the interfaces for CancellationToken are on 568_CancellationToken, all the tests pass so it's just a matter of whether you want to do it.
They look good and indeed simplify maintenance of the library a lot. I guess once you finalize the other branch (with type converters) we can release 6.0 based on them (perhaps also some fixes).
And of course custom serialization would be another great reason for major version bump. Just need to figure it how to implement it without changing fluent interface methods.
Hi Paul,
Why can't we use ReadAsStreamAsync from HttpResponseMessage instead of ReadAsStringAsync to benefit from the performance as suggested in https://www.newtonsoft.com/json/help/html/Performance.htm) . We can also pass JsonSerializer settings as parameter which will be used during deserialization. We can write FindEntriesAsync method like below. We can use ContractResolver with JsonSerializer to trim() the spaces if any from the OData response during deserialization.
Example
var jsonSerializer = new JsonSerializer { NullValueHandling = NullValueHandling.Ignore, MissingMemberHandling = MissingMemberHandling.Ignore, ContractResolver = new TrimConverterResolver() };
public async Task<IEnumerable
// Default JsonSerializer if the JsonSerializer parameter is null if (jsonSerializer == null) { jsonSerializerSettings = new JsonSerializer(); }
using (var stream = await httpResponseMessage.Content.ReadAsStreamAsync().ConfigureAwait(false))
{
using (var streamReader = new StreamReader(stream))
{
using (var jsonReader = new JsonTextReader(streamReader))
{
return jsonSerializer.Deserialize<ODataResponse<T>>(jsonReader).Value;
}
}
}
} using System; using Newtonsoft.Json.Serialization;
/// <summary>
/// TrimConverterResolver implements IContractResolver
/// </summary>
public class TrimConverterResolver : DefaultContractResolver
{
///
// this will only be called once and then cached
if (objectType == typeof(string))
{
contract.Converter = new TrimmingConverter();
}
return contract;
}
}
using System; using Newtonsoft.Json;
/// <summary>
/// TrimmingConverter
/// </summary>
public class TrimmingConverter : JsonConverter
{
///
/// <inheritdoc/>
public override bool CanWrite => false;
/// <inheritdoc/>
public override bool CanConvert(Type objectType)
{
return objectType == typeof(string);
}
/// <inheritdoc/>
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
return Convert.ToString(reader.Value)?.Trim();
}
/// <inheritdoc/>
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}
@phatcher I've been thinking more about your suggestion and while I definitely see the use of custom converters that would be faster than the one offered by ODataLib, I am not sure this can be achieved in a general way. Even today those who want to customise handling of response messages can grab them after slight rewrite of code:
var request = await _client
.For<Person>()
.BuildRequestFor()
.FindEntriesAsync();
var response = await request.RunAsync();
The code above makes accessible HTTP response message, so after that developer is on his own in how this message is handled. It can be converted to string or JSON and then deserialised the best suitable way. Can this be done smoother, so people won't even need call BuildRequestFor and RunAsync and add custom deserialization behind the scenes? I am afraid it will be quite hard without losing edge cases and additional features like support for batches and annotations.
@object Thanks for the analysis, I think you are correct and custom deserialization is out of scope, but we might need to expand some of the interfaces to allow for slightly more control.
The current issue I have is how to do custom deserialization whilst still allowing for pagination via the annotations.NextLink behaviour. For example, I have an extension in my library to retrieve pages of records/all records which takes into account the server-side pagination i.e. say the client wants a page size of 50 but the server only delivers 20 per request, we need to follow some page links...
/// <summary>
/// Get all entries for a query up to a maximum quantity
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="client">Client to use</param>
/// <param name="maxEntries">Maximum entries to retrieve</param>
/// <param name="cancellationToken"></param>
/// <returns></returns>
public static async Task<DataPage<T>> FindAllEntriesAsync<T>(this IBoundClient<T> client, int maxEntries = 1000, CancellationToken cancellationToken = default(CancellationToken))
where T : class
{
try
{
var items = await client.ToDataPage(1, maxEntries, cancellationToken).ConfigureAwait(false);
return items;
}
// catch (Exception ex)
catch (Exception)
{
throw;
}
}
/// <summary>
/// Get a page of data with a specific page size
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="client">Client to use</param>
/// <param name="pageNumber">Page number to use</param>
/// <param name="pageSize">Page size to use</param>
/// <param name="cancellationToken"></param>
/// <returns></returns>
public static async Task<DataPage<T>> ToDataPage<T>(this IBoundClient<T> client, int pageNumber, int pageSize, CancellationToken cancellationToken = default(CancellationToken))
where T : class
{
if (pageSize <= 0 || pageNumber <= 0)
{
return null;
}
// Total Item to show per page
var items = new List<T>();
// Data to skip
var itemsToSkip = pageSize * Math.Max(pageNumber - 1, 0);
var annotations = new ODataFeedAnnotations();
// Get the first page
var entities = await client.Skip(itemsToSkip)
.Top(pageSize)
.FindEntriesAsync(annotations, cancellationToken)
.ConfigureAwait(false);
items.AddRange(entities.ToList());
// Server may have chunked it
while (annotations.NextPageLink != null)
{
entities = await client.FindEntriesAsync(annotations.NextPageLink, annotations, cancellationToken).ConfigureAwait(false);
items.AddRange(entities.ToList());
}
// Give it back.
var page = new DataPage<T>
{
Data = items,
PageNumber = pageNumber,
PageSize = pageSize,
Count = annotations.Count.HasValue ? (int)annotations.Count : 0
};
return page;
}
The problem is if I try to translate this to something with custom deserialization I can get the first page with
var query = await client.Skip(itemsToSkip)
.Top(pageSize)
.BuildRequestFor()
.FindEntriesAsync(annotations, cancellationToken)
var response = await query.RunAsync(cancellationToken);
.ConfigureAwait(false);
var entities = await deserializer.FindEntriesAsync<T>(response.Message, cancellationToken);
where deserializer is the custom deserialization class, but then there is no overload of IRequestBuilder{T} which takes an Uri so that I can call it with annotations.NextPageLink.
BTW I don't mind contributing the code above along with the DataPage class, if you think it appropriate as it's a very common issue when you start consuming APIs as you can't necessarily control the server pagination behaviour.
@raviraj1976 Yes using the streaming is a good idea; I tried it on my project and it's approximately 30% faster than the string deserialization.
The other thing fairly obvious thing I found, though it's worth stating, is to only select the properties you need; I saved about 25% by not returning audit columns and columns I had already filtered on, this reduces the returned payload size but also cuts down the amount of parsing of you have to do. The final result was a reduction from 7.5s to 3.3s for 10K records.
Agree. Having a select operator to get only the required columns will reduce the payload. Another option is to include gzip compression (AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip) on the OData client. Reference http://jonathanpeppers.com/Blog/improving-http-performance-in-xamarin-applications
On Mon, Jan 14, 2019 at 5:27 AM Paul Hatcher notifications@github.com wrote:
@raviraj1976 https://github.com/raviraj1976 Yes using the streaming is a good idea; I tried it on my project and it's approximately 30% faster than the string deserialization.
The other thing fairly obvious thing I found, though it's worth stating, is to only select the properties you need; I saved about 25% by not returning audit columns and columns I had already filtered on, this reduces the returned payload size but also cuts down the amount of parsing of you have to do. The final result was a reduction from 7.5s to 3.3s for 10K records.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/object/Simple.OData.Client/issues/568#issuecomment-453958728, or mute the thread https://github.com/notifications/unsubscribe-auth/ARInIlpcULdCpFCluowYginy8j9O_UqOks5vDFuCgaJpZM4ZajZW .
@raviraj1976 I thought that was on by default, one other thought was to see what effect using BSON serialization has since it's supposed to be quicker to parse.
@object I don't think there's much more I can do to improve the performance of ToObject, as it now has a low overhead e.g. in the latest performance profile ToObject cost approx 33 µs per record with a total time spent in Simple.OData.Client of 1.4s vs 5.8s for Microsoft (largely Edm).
So the predominant cost is OData parsing from the raw json, which we can see from the benchmark results where using the library versus using Newtonsoft.Json is about 10x more expensive.
You posted a fragment which allows access to the HttpResponseMessage e.g.
I have one use case which is retrieving a lot of records (~22k) and the difference is dramatic 4.5s vs 8s and doing the usual profiler run you can see that it's all spent in parsing...
SoC deserializer
NewtonSoft Json
I've taken your code and generalized it slightly introducing a new interface IODataDeserializer...
Then we can have a default implementation as follows...
And some extension methods
The naming is so that it doesn't clash with the existing method on IBoundClient etc. Using it then becomes very easy..
One benefit of this method is that the FindDeserializeEntitiesAsync can take a different type that the source entity.
Couple of points open for discussion...
I don't think this becomes the default behaviour since you need the complex OData parsing for some domain models, but there are definite use cases where a simpler deserialization is useful.