Scalability: Technical - Githubissues

As the number, size and variety of items stored in ERA grows (including full text), we want to be sure that we can support them. We are concerned about performance, storage, internal limits (e.g. number of items in a collection), external connections (an external triplestore, Pushmi-Pullyu). We expect problems to emerge first in Fedora response time.

Performance testing should be continuous to discover changes over time (as the stack moves forward and more content goes in). All results should be shared with the community.

What constitutes done?

acceptable load times for public pages in a system containing comparable content to what we expect to load in ERA in the next three years
- this assumes we can develop or simulate the more complex models we expect to need, up to Peel newspapers
acceptable rate of ingest
acceptable speed of Solr reindexing (partial and full), or strategies for addressing this
acceptable time to preservation for new/changed objects

Technical Questions:

discover basic limitations (CPU bound? difference between bare metal and VM?), test alternatives
are there options for clustering Fedora?
can we optimize at the application level to reduce calls to Fedora?
what about the Fedora community's efforts toward better performance?
how do we coordinate development with the modelling being done by the Metadata Team, to ensure we discover performance costs early?
what testing framework would we use to run automated tests and report results?
how far would we go in the direction of CLAW (i.e. remove Fedora from the stack completely)?
how can we share this work with the community (e.g. make our tests and reports compatible with other people's so that results can be compared)?

Tests We need appropriate infrastructure to do realistic scripted tests against Fedora to determine limitations of the current environment, and to explore optimization strategies. Questions:

what physical/logical environment would we need to make these tests meaningful? Do we have the hardware we need?
who's already done this? (Princeton...)

Resources

a Google Doc where we can gather sources of information, tools, etc. that are relevant

ualbertalib / Hydranorth2

Scalability: Technical #134