iTwin / presentation

Monorepo for iTwin.js Presentation Library
https://www.itwinjs.org/presentation/
MIT License
4 stars 0 forks source link

Accessing ECClass information is slow when they come from large ECSchemas #601

Open grigasp opened 3 months ago

grigasp commented 3 months ago

Our ECClassHierarchyInspector implementation is based on ECSchemaProvider. We create the provider using SchemaContext and ECSchemaRpcLocater, which always downloads the full schema from the backend. All of this means that simply checking is class X is of class Y required fully downloading schemas of those two classes. And some schemas can be massive, taking up to 8 seconds or more to download.

We should come up with a way to improve this. Maybe we could pre-load the schemas. Or, if we don't yet have the schema on the frontend, issue the schema request and an ECDbMeta query request at the same time and use the latter for the immediate response.

Here's an example of getting the first Models tree branch performance. The green is our baseline, the brown - schemas are preloaded before the test. Image

grigasp commented 2 months ago

Investigation notes

The problem is not just with class hierarchy inspection, but also with getting classes for class grouping - that also requires us to get node's class, which requires downloading and parsing the schema. In the specific case of 9.5 green column in issue description, the schema that takes majority of time is 32 MB large.

I tried the following approach:

race(
  getClassUsingECSchemaProvider(),
  timer(0).pipe(mergeMap(() => getClassUsingMetadataQuery()),
)

Here we attempt to get the class using our schema provider, but at the same time (on next event loop iteration) we send out a metadata query to get the same information. In case we already have the schema, the metadata query is not even sent. In case we don't - both requests are sent and we wait for the first one to complete, with the expectation that metadata query is generally very quick. This did improve the performance slightly (9.5 s was reduced to ~6.5 s) at the cost of complexity. However, I noticed that the large schema request was causing the backend's main thread to block, thus causing the quick query requests to not respond either (@ColinKerr, do you have any backlog item related to this?).

IMO, the next step for us should be to investigate possibility to avoid using SchemaContext altogether and, instead, create an ECSchemaProvider implementation that loads (and caches) schema information using pure ECSQL queries.

ColinKerr commented 2 months ago

Loading the and serializing to json for the RPC request does happen on the main thread. So this is blocking at the moment. Though we had hoped writing each schema was fast enough not to be a problem. I guess with these large schemas that does not hold true.

For these performance numbers did you have RPC compression enabled and a proper async schema locater (one that returns schema info before loading the rest of the schema)?

For an ECSchemaProvider are you envisioning that we would dynamically build up the schema object model or that you would build a bespoke cache for presentation?

grigasp commented 2 months ago

For these performance numbers did you have RPC compression enabled and a proper async schema locater (one that returns schema info before loading the rest of the schema)?

I did have RPC compression enabled for the tests. I just tried this in browser, where compression is also enabled - here're the results:

As you can see, majority of time is spent on the backend (I assume - the main thread). I'm surprised that downloading 357 kB took nearly 600 ms, even if I'm requesting it from EUS datacenter from Europe. Chrome DevTools documentation isn't very clear about what "Content Download" involves exactly, but it seems it may include decompression as well.

As for the schema locater, we're using the ECSchemaRpcLocater, which I believe is "proper" :) Still, I see it blocks the main frontend thread for more than 400 ms: image

For an ECSchemaProvider are you envisioning that we would dynamically build up the schema object model or that you would build a bespoke cache for presentation?

Ideally, I think the problem needs to be solved at ecschema-metadata and related packages level - presentation packages aren't (or at least - won't be) the only user of those APIs. However, I think for the time being we could implement that on our end - our use case is pretty simple at this point, so it would be much easier and quicker to cover it on our end that to implement everything on yours. We already have an abstraction layer over ecschema-metadata, so using another implementation should not be a problem.

ColinKerr commented 2 months ago

So compression shaved maybe a second or two off of the total time?

I suspect the hang on the frontend is because we parse the 32MB json in one go. Schemas really shouldn't be this big.

I agree we have to do better, let me see what we can come up with.

grigasp commented 2 months ago

So compression shaved maybe a second or two off of the total time?

Depends on the internet connection and how far is the data center. In this specific case, I'm running the test from Europe against EUS datacenter with a fairly slow internet connection:

time to get schema json from backend (ms)
uncompressed 38829
compressed 2490

So it definitely helps a lot.