knowledgesystems / cmo-pipelines

GNU Affero General Public License v3.0
3 stars 10 forks source link

Reenable GENIE Imports in V2 Node #1124

Closed averyniceday closed 4 months ago

averyniceday commented 7 months ago

Redistribute genie import tools + ensure import node is in same subnet

sheridancbio commented 7 months ago

Connect with @averyniceday when working on the import-tool updates. Initial examination indicates that the import-tool binary executable is/was running on the genie importer (knowledgesystems-importer) node. The shell script tool used by the curators presumably connects with the genie importer node in order to create the trigger files for starting or killing an import run.

sheridancbio commented 4 months ago

The ec2 node i-0a8c3a8a243d16d10 (knowledgesystems-cbioportal-importer) in the V2 account is set up for running imports into the gene database rds node cbioportal-genie-db-green.caakrwnbyjl6.us-east-1.rds.amazonaws.com. Both nodes are in availability zone us-east-1c to reduce latency during import.

We have chosen to not have individual user accounts on this node, and to use a shared importer account (with standard username cbioportal_importer) instead. For the purpose of the import-tool, each import tool user must have a user account in order to provide the ssh authentication key which the tool users to execute the import-tool command on the importer node. So individual user accounts were created for all import-tool users, but since we do not wish to encourage individual user login and use of the filesystem, no passwords for these accounts will be distributed. In each of these user's home directories, an .ssh directory was created with file authorized_keys housed therein and the public keys of the importer tool ssh key pairs were copied from the old genie importer node into the new genie importer node. (and file modes were set appropriately)

To allow previous distributions of the tool to continue to function correctly, the google domains DNS settings were adjusted so that dns name "knowledgesystems-importer.cbioportal.org" resolves as a CNAME alias to "knowledgesystems-importer.cbioportal.aws.mskcc.org". In route53, this was set to resolve to a CNAME to the AWS instance DNS name provided by the AWS console for the v2 genie importer node. Users will need to delete the record from their local machine .ssh/known_hosts file belonging to the old import server because the server fingerprint has changed.