Poor perfomance in fetching a set of entities. State/Stateless session mode

korchak-aleksandr commented 6 years ago

Why NH 5.1 in session mode and in stateless session mode shows same results? For example benchmark in https://github.com/FransBouma/RawDataAccessBencher

I forked this project: https://github.com/sanekpr/RawDataAccessBencher/ And I create NH behch with stateless session mode.

And my results for fetch set test:

EF Core 2.1 (no change tracking) - 202ms
Dapper 1.50 - 221 ms
NH 5.1 (stateless mode) - 2 055 ms

May be I made smth wrong? Or stateless mode doesn't affect on fetch set perfomance?

fredericDelaporte commented 6 years ago

You can read some considerations in FransBouma/RawDataAccessBencher#9 comments.

korchak-aleksandr commented 6 years ago

@fredericDelaporte as you write in that comment I create the implementation with Stateless Session. But perfomance is almoust the same as with session.

May be loading entities as readonly objects fixes this? And how can I load set or single entity as readony in methods Get or Query ?

fredericDelaporte commented 6 years ago

I am referring to:

By the way, if by "tracking" it is understood something like "the ORM is somehow notified of changes on entities", there are indeed currently no tracking at all in NHibernate

There are no such mechanism in NHibernate, so going stateless has no impact on the entities themselves.

Instead, NHibernate keeps in the session a snapshot of the state of each loaded entity at the time it was loaded, and on flush, check each entity current state against the snapshot for detecting if the entity was changed.

The bulk of the work is the dirty check, which only occurs on flush. So on read usages, the dirty check has no impact. Storing a copy of the loaded state is of course done on read in a normal session, but it has mainly a memory impact.

korchak-aleksandr commented 6 years ago

So, at this moment EFCore demostrate 10x faster perfomance in fetching sets of entities. No doubt that NHibernate has 10x rich capabilities, and it would be right that perfomance is at least the same :-)

@fredericDelaporte @hazzik, Would you plan some perfomance improvement to read scenarios in future releases?

fredericDelaporte commented 6 years ago

For me the answer is no, it is not my priority to optimize for a benchmark.

Anyway, benchmarks are usually far from reflecting performances in actual usage cases. They may help checking optimizations actual gains, but using them for comparing tools having vastly different implementation principles, drawbacks and benefits, is generally a bit moot.

korchak-aleksandr commented 6 years ago

@fredericDelaporte I'm not ask you to optimize NH perfomance for this benchmark.

I ask you to add in some task plan general optimization in fetching set of entities from database. Do you think it's not important task?

fredericDelaporte commented 6 years ago

Anyway, benchmarks are usually far from reflecting performances in actual usage cases. They may help checking optimizations actual gains, but using them for comparing tools having vastly different implementation principles, drawbacks and benefits, is generally a bit moot.

I mean this benchmark does not prove there is an actual need for optimizing fetching.

Of course you are free and welcome to investigate this subject by yourself and find-out any optimization which could be done in the code base.

We do care about performances, provided evidence of implementation points requiring some work are provided, like in #1391 (or #1446, followed by #1455).

hazzik commented 5 years ago

This is the culprit for the poor performance of a stateless session: https://github.com/nhibernate/nhibernate-core/blob/14c5cdb3fe71a649f2deb1e23d0e9088ebdf3738/src/NHibernate/Impl/StatelessSessionImpl.cs#L73

Stateless session creates proxies for associations. However these proxies will not be able to be resolved into the entities as stateless session does not support proxies. In my current setup creation of these proxies takes around 33% of the execution time (most of the time is taken by constructors thou).

bahusoid commented 5 years ago

However these proxies will not be able to be resolved into the entities as stateless session does not support proxies.

So are you saying that new Entity() vs proxy creation makes so much a difference. But why? Shouldn't static proxy generator be some equivalent of "new EntityProxy()" call?

I see that persister.CreateProxy leads to series of calls that end up in concurrent dicitionary lookup each time. Can't it be the reason for slowleness?

Maybe we should extend IProxyFactory to return proxy activator delegate. Something like: Func<object, ISessionImplementor, object> GetProxyActivatorDelegate;

Then I think we can cache this delegate inside persister.

hazzik commented 5 years ago

No. The entities' constructors have 80% of the impact.

hazzik commented 5 years ago

So, in case of StatelessSession GetProxy takes 305ms our of 1000ms: Most of the time it spends in GC. I believe this is because it needs to clear the heap to put these objects there.

bahusoid commented 5 years ago

But what fix do you have in mind for this? Return null instead? It would be a good option to have. But it's a breaking change and it would not allow to retrieve loaded objects identifiers. So it should be a configurable behavior.

I wonder if it's possible to implement some lightweight proxy class. With only get/set Identifier functionality that doesn't call base ctor at all and throws for all other members.

hazzik commented 5 years ago

But what fix do you have in mind for this?

I have no ideas:(

fredericDelaporte commented 5 years ago

Not really better on my side.

I wonder if it's possible to implement some lightweight proxy class. With only get/set Identifier functionality that doesn't call base ctor at all and throws for all other members.

I do not think it would do any good, if it was possible.

Maybe instead switch the loaded entity to a field interceptor proxy with uninitialized associations, which would create their proxy lazily on first get. But turning all entities with associations to field interceptor proxies is a bit debatable.

hazzik commented 5 years ago

Also, around 40% (~400ms/1000ms) of the time is spent in the GC. The worst offenders are Task<object> and SessionIdLoggingContext (~16mb memory traffic). I tried switching SessionIdLoggingContext to a struct but it did not really help with the time.

fredericDelaporte commented 5 years ago

SessionIdLoggingContext: that thing is mainly useful for an external tool. If it has such an impact, it is time to add an enabling option and disable it by default.

bahusoid commented 5 years ago

The worst offenders are Task<object>

I wonder if we really need to generate all internals/private members as async with many awaits inside. Maybe we should generate as async only public members. And for private members instead of generating

private async Task<IList> ListIgnoreQueryCacheAsync(ISessionImplementor session, QueryParameters queryParameters, CancellationToken cancellationToken)

generate

private IList ListIgnoreQueryCacheForAsync(ISessionImplementor session, QueryParameters queryParameters, CancellationToken cancellationToken)

And for public members generate something like:

public async Task<List<T>> ListAsync<T>(CancellationToken cancellationToken)
{
    return await Task.Run(() =>
    {
        ...
        var results = ListIgnoreQueryCacheForAsync<T>(session, params, cancellationToken));
        ...
    });
}

hazzik commented 5 years ago

I did not get this code snippet.

bahusoid commented 5 years ago

So basically I think that private members can stay synchronous. And public async implementations can call sync version wrapped in Task. So straightforward version of this implementation will do simply:

public async Task<List<T>> ListAsync<T>(CancellationToken cancellationToken)
{
    return await Task.Run(() =>
    {
        return List<T>();
    });
}

What disadvantages do you see except that cancelationToken is ignored? That will significantly minimize numbers of tasks created for sure.

What's left is to add cancelationToken propagation for synchronous methods. And that's what I tried to show with ListIgnoreQueryCacheAsync -> ListIgnoreQueryCacheForAsync example. Make async generator to generate private sync methods but with cancelationToken parameter.

Final method could look something like:

public async Task<List<T>> ListAsync<T>(CancellationToken cancellationToken)
{
    return await Task.Run(() =>
    {
              //This sync method is generated based on `List` with cancellationToken propagation for all private members
        return ListForAsync<T>(cancellationToken);
    });
}

maca88 commented 5 years ago

What disadvantages do you see except that cancelationToken is ignored?

By generating only the root method as async and then call the sync method inside it makes the async method synchronous aswell as there are no true async calls inside it. Your ListAsync can be simplified as:

public async Task<List<T>> ListAsync<T>(CancellationToken cancellationToken)
{
    return Task.FromResult(List<T>());
}

In this case DbDataReader.ExecuteReaderAsync is the first async call, which is called deep inside the ListAsync method. By wrapping List inside ListAsync method, the DbDataReader.ExecuteReader method will be called instead which is not asynchronous and will block the current thread until the I/O operation is done. If you want a true async operation you have to use async all the way up in order to achieve it, where the starting point is the method which does some asynchronous operations. AsyncGenerator uses the "all way up" strategy where the starting point can be defined in the configuration, for this case the following configuration is used: https://github.com/nhibernate/nhibernate-core/blob/da429f76d7ef6dea502f92d9098ebf3b0a4691c9/src/AsyncGenerator.yml#L107-L112 the generator will generate all dependent methods as async if possbile in order to have a true async method like ListAsync, which does not block any thread when called. About the overhead of Task<object>, we could reduce its allocation by avoid calling IType.HydrateAsync and ILoadable.HydrateAsync and call their sync counterparts instead as currently there are no async calls inside HydrateAsync. From the benchmarks in #1969, the "Allocated bytes" for the "Async eager Load fetches" dropped by approx 14%.

bahusoid commented 5 years ago

So it seems "true async operations" avoid any threads from waiting. That's something I don't understand how could be implemented. Will do my research on this. Any useful links would be appreciated.

By generating only the root method as async and then call the sync method inside it makes the async method synchronous aswell as there are no true async calls inside it.

As I said I don't know what is "true async calls" methods - but Task.Run version is executed in separate thread and your FromResult example is not. See my example. You can see that all Task.Run calls are executed in separate threads.

maca88 commented 5 years ago

Any useful links would be appreciated.

Here is an article (search for "Async All the Way") which explains why async should be used all the way up/down.

Task.Run version is executed in separate thread and your FromResult example is not

I was wrong by saying that Task.Run can be replaced with FromResult as it can introduce a deadlock which is explained in the above article. Nonetheless, by using Task.Run you will still have the problem of a thread waiting for an asynchronous operation to complete, which will limit the scalability of the application due to the ThreadPool starvation. Also, here is an article that explains in depth the characteristics of the async methods.

nhibernate / nhibernate-core

Poor perfomance in fetching a set of entities. State/Stateless session mode #1858