opencb / cellbase

High-Performance NoSQL database and RESTful web services to access to most relevant biological data
Apache License 2.0
89 stars 53 forks source link

Move index creation to Java code #454

Open imedina opened 5 years ago

imedina commented 5 years ago

Currently, MongoDB indexes are created using the mongo client and these index files https://github.com/opencb/cellbase/tree/develop/cellbase-app/app/mongodb-scripts

This requires mongo client to be installed for indexing data which can be problematic in some environments like Docker. Index creation should be done programmatically as in OpenCGA, @pfurio can help here.

A possible implementation for this could be:

  1. Create a new configuration file with all the indexes to be created like in https://github.com/opencb/opencga/blob/develop/opencga-catalog/src/main/resources/catalog-indexes.txt
  2. A Java method reads this file and creates the indexes

A possible improvement will be to define these indexes in a YAML file instead of TXT

julie-sullivan commented 4 years ago

Here is the class in commons that we can use to create indexes in cellbase:

https://github.com/opencb/java-common-libs/commit/ff9480a1969bc5bbe761616de30a39cdb0ee4aeb

julie-sullivan commented 4 years ago

After discussion today with @imedina

  1. remove index creation from Load, remove associated JS files
  2. add parameter to load command line to index, e.g. load --data gene, genome --index
  3. add command line option, index, e.g. index --data gene,genome
  4. add flag to drop all indexes first, default action: create index ONLY if it's not there already.
  5. Move JSON file to cellbase-app/indexes
  6. Copy to install directory in (new) /conf directory.
julie-sullivan commented 4 years ago

Looks good so far:

2019-11-07T13:03:17.076+0000 I  INDEX    [conn11] build may temporarily use up to 500 megabytes of RAM
2019-11-07T13:03:17.076+0000 I  INDEX    [conn11] index build: collection scan done. scanned 0 total records in 0 seconds
2019-11-07T13:03:17.076+0000 I  INDEX    [conn11] index build: inserted 0 keys from external sorter into index in 0 seconds
2019-11-07T13:03:17.079+0000 I  INDEX    [conn11] index build: done building index transcripts.xrefs.dbName_1 on ns cellbase_hsapiens_grch37_v5.gene
2019-11-07T13:03:17.091+0000 I  INDEX    [conn11] index build: starting on cellbase_hsapiens_grch37_v5.gene properties: { v: 2, key: { transcripts.xrefs.dbDisplayName: 1 }, name: "transcripts.xrefs.dbDisplayName_1", ns: "cellbase_hsapiens_grch37_v5.gene", background: true } using method: Hybrid
2019-11-07T13:03:17.091+0000 I  INDEX    [conn11] build may temporarily use up to 500 megabytes of RAM
2019-11-07T13:03:17.091+0000 I  INDEX    [conn11] index build: collection scan done. scanned 0 total records in 0 seconds
2019-11-07T13:03:17.091+0000 I  INDEX    [conn11] index build: inserted 0 keys from external sorter into index in 0 seconds
2019-11-07T13:03:17.093+0000 I  INDEX    [conn11] index build: done building index transcripts.xrefs.dbDisplayName_1 on ns cellbase_hsapiens_grch37_v5.gene
2019-11-07T13:03:17.112+0000 I  INDEX    [conn11] index build: starting on cellbase_hsapiens_grch37_v5.gene properties: { v: 2, key: { transcripts.exons.id: 1 }, name: "transcripts.exons.id_1", ns: "cellbase_hsapiens_grch37_v5.gene", background: true } using method: Hybrid
2019-11-07T13:03:17.112+0000 I  INDEX    [conn11] build may temporarily use up to 500 megabytes of RAM
2019-11-07T13:03:17.112+0000 I  INDEX    [conn11] index build: collection scan done. scanned 0 total records in 0 seconds
2019-11-07T13:03:17.113+0000 I  INDEX    [conn11] index build: inserted 0 keys from external sorter into index in 0 seconds
2019-11-07T13:03:17.117+0000 I  INDEX    [conn11] index build: done building index transcripts.exons.id_1 on ns cellbase_hsapiens_grch37_v5.gene
2019-11-07T13:03:17.139+0000 I  INDEX    [conn11] index build: starting on cellbase_hsapiens_grch37_v5.gene properties: { v: 2, key: { transcripts.exons.chromosome: 1 }, name: "transcripts.exons.chromosome_1", ns: "cellbase_hsapiens_grch37_v5.gene", background: true } using method: Hybrid

And looking in mongo, the indexes look correct.

julie-sullivan commented 4 years ago

After discussion today with @imedina