Have security check limit file query by fileset ID.

mtbc commented 4 years ago

Aims to improve performance of fetching image data from servers with a very large number of files imported.

joshmoore commented 4 years ago

A copy of the stack trace for the review:


"Blitz-0-Ice.ThreadPool.Server-16" #312 prio=5 os_prio=0 tid=0x00007fe8fc002000 nid=0x50d9 runnable [0x00007fe847dfa000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
        at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
        at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
        at org.postgresql.core.PGStream.receiveChar(PGStream.java:293)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1936)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306)
        - locked <0x00000001ce471ed8> (a org.postgresql.core.v3.QueryExecutorImpl)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
        at bitronix.tm.resource.jdbc.proxy.PreparedStatementJavaProxy.executeQuery(PreparedStatementJavaProxy.java:102)
        at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)
        at org.hibernate.loader.Loader.getResultSet(Loader.java:1953)
        at org.hibernate.loader.Loader.doQuery(Loader.java:802)
        at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
        at org.hibernate.loader.Loader.doList(Loader.java:2542)
        at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276)
        at org.hibernate.loader.Loader.list(Loader.java:2271)
        at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:459)
        at org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:365)
        at org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:196)
        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1268)
        at org.hibernate.impl.QueryImpl.list(QueryImpl.java:102)
        at ome.services.query.Query.doInHibernate(Query.java:253)
        at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:411)
        at org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:342)
        at ome.logic.QueryImpl.execute(QueryImpl.java:176)
        at ome.logic.QueryImpl.projection(QueryImpl.java:474)
        at ome.services.blitz.repo.ManagedReaderSecurityCheck.assertUsedFilesReadable(ManagedReaderSecurityCheck.java:131)
        at ome.io.nio.PixelsService$3.setId(PixelsService.java:861)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
        at loci.formats.ChannelFiller.setId(ChannelFiller.java:223)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
        at loci.formats.ChannelSeparator.setId(ChannelSeparator.java:293)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:650)
        at loci.formats.Memoizer.setId(Memoizer.java:789)
        at ome.io.bioformats.BfPixelsWrapper.<init>(BfPixelsWrapper.java:52)
        at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:73)
        at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)
        at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:889)
        at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:646)
        at ome.io.nio.PixelsService.getPixelBuffer(PixelsService.java:564)

joshmoore commented 4 years ago

Just to confirm: you'd assert that this is only called on the first setId during the import process after all the files have been transferred?

Also, did you happen to compare any EXPLAIN ANALYZE output?

mtbc commented 4 years ago

No, it'll be called whenever setId is. This PR's probably the safest piece of the puzzle in not much changing how the existing fix works, just significantly speeding it up, so I'd expect it to be safe to include in the next release without much further thought. It'll still be worth having in place,

CREATE INDEX originalfile_path_name_index ON originalfile ((path || name));

and now you mention it I've not compared EXPLAIN ANALYZE with exactly this version, could be worth comparing that both with and without the index. The fileset limiting should really help with both.

mtbc commented 4 years ago

Post-import setId checks are done because https://github.com/openmicroscopy/omero_private/pull/98#issuecomment-448610889 shows how the set of used files could increase if new files appeared. There is plausibly still scope for improvement in that direction but perhaps less hastily.

joshmoore commented 4 years ago

No, it'll be called whenever setId is.

Hmm....I couldn't get it to execute on RawPixelsStore.setPixelsId.

mtbc commented 4 years ago

Huh, I'd have expected that to hit PixelsService.createBfReader, maybe the setId is being reached otherwise?

mtbc commented 4 years ago

Aha, I've figured it out, you have to delete the image's bfmemo file first, otherwise it calls reopenFile instead of setId. This may explain why performance is not impacted much worse than it is.

mtbc commented 4 years ago

Rather encouraging:

omero=> EXPLAIN ANALYZE SELECT COUNT(DISTINCT o.id) FROM filesetentry AS e LEFT JOIN originalfile AS o ON (e.originalfile = o.id) WHERE e.fileset = 1 AND o.repo = 'foo' AND o.path || o.name IN ('bar', 'baz', 'wibble', 'flong');
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16.61..16.62 rows=1 width=8) (actual time=0.084..0.085 rows=1 loops=1)
   ->  Nested Loop  (cost=0.55..16.60 rows=1 width=8) (actual time=0.034..0.034 rows=0 loops=1)
         Join Filter: (e.originalfile = o.id)
         ->  Index Scan using i_filesetentry_fileset on filesetentry e  (cost=0.27..8.29 rows=1 width=8) (actual time=0.016..0.017 rows=1 loops=1)
               Index Cond: (fileset = 1)
         ->  Index Scan using originalfile_repo_path_index on originalfile o  (cost=0.28..8.30 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1)
               Index Cond: ((repo)::text = 'foo'::text)
               Filter: ((path || name) = ANY ('{bar,baz,wibble,flong}'::text[]))

joshmoore commented 4 years ago

Merging. I assume the next step is to get this onto a server with a significant number of filesets and original files.

mtbc commented 4 years ago

Sounds good. Would it help to tag and release?

joshmoore commented 4 years ago

It certainly can't hurt, at least not until it hits an ome/openmicroscopy build.

ome / omero-blitz

Have security check limit file query by fileset ID. #89