This issue covers effort/tasks required to update (replace) the existing Fragment Graph network used by neo4j for Fragalysis, and may consist of work relating to: -
[ ] Resurrecting the fragmentation process
[ ] Definition of the molecule set
We anticipate the new database may be up to twice the size of the exiting database (molecules).
Resurrecting the fragmentation process
The prior fragmentation process relied on
The existence of a Galaxy/slurm cluster
A VM running a PostgreSQL database with significant resources and attached volumes. The existing fragmentation data used a 4TB database volume and similar sized volumes for storage of the intermediate fragmentation files.
A control node where the Ansible playbooks that drive the fragmentation process are executed
S3-compliant (echo/AWX) object store to hold input and output data
The playbooks and related documentation can be found in the Informatics Matters fragmentor repository
As the original cluster processing hardware (galaxy) has been lost, resurrecting the fragmentation process comes with some risk: -
We need very large volumes shared between the cluster processing nodes and the database
We should expect to need 6-8TB of database volume, accessible from the cluster processing nodes (i.e. an NFS-like volume)
A suitable volume shared between the cluster processing nodes
A configured database external available to the cluster
Sufficient bucket (echo) storage
Definition of the new molecule set
The fragmentation process's prior database may be available, if not it can be recreated by rerunning the appropriate playbooks, but we need a complete definition of molecules required in the new database.
This issue covers effort/tasks required to update (replace) the existing Fragment Graph network used by neo4j for Fragalysis, and may consist of work relating to: -
Resurrecting the fragmentation process
The prior fragmentation process relied on
As the original cluster processing hardware (galaxy) has been lost, resurrecting the fragmentation process comes with some risk: -
Definition of the new molecule set
The fragmentation process's prior database may be available, if not it can be recreated by rerunning the appropriate playbooks, but we need a complete definition of molecules required in the new database.