Open imedina opened 5 years ago
Here is the class in commons that we can use to create indexes in cellbase:
https://github.com/opencb/java-common-libs/commit/ff9480a1969bc5bbe761616de30a39cdb0ee4aeb
After discussion today with @imedina
load
command line to index, e.g. load --data gene, genome --index
index --data gene,genome
cellbase-app/indexes
/conf
directory.Looks good so far:
2019-11-07T13:03:17.076+0000 I INDEX [conn11] build may temporarily use up to 500 megabytes of RAM
2019-11-07T13:03:17.076+0000 I INDEX [conn11] index build: collection scan done. scanned 0 total records in 0 seconds
2019-11-07T13:03:17.076+0000 I INDEX [conn11] index build: inserted 0 keys from external sorter into index in 0 seconds
2019-11-07T13:03:17.079+0000 I INDEX [conn11] index build: done building index transcripts.xrefs.dbName_1 on ns cellbase_hsapiens_grch37_v5.gene
2019-11-07T13:03:17.091+0000 I INDEX [conn11] index build: starting on cellbase_hsapiens_grch37_v5.gene properties: { v: 2, key: { transcripts.xrefs.dbDisplayName: 1 }, name: "transcripts.xrefs.dbDisplayName_1", ns: "cellbase_hsapiens_grch37_v5.gene", background: true } using method: Hybrid
2019-11-07T13:03:17.091+0000 I INDEX [conn11] build may temporarily use up to 500 megabytes of RAM
2019-11-07T13:03:17.091+0000 I INDEX [conn11] index build: collection scan done. scanned 0 total records in 0 seconds
2019-11-07T13:03:17.091+0000 I INDEX [conn11] index build: inserted 0 keys from external sorter into index in 0 seconds
2019-11-07T13:03:17.093+0000 I INDEX [conn11] index build: done building index transcripts.xrefs.dbDisplayName_1 on ns cellbase_hsapiens_grch37_v5.gene
2019-11-07T13:03:17.112+0000 I INDEX [conn11] index build: starting on cellbase_hsapiens_grch37_v5.gene properties: { v: 2, key: { transcripts.exons.id: 1 }, name: "transcripts.exons.id_1", ns: "cellbase_hsapiens_grch37_v5.gene", background: true } using method: Hybrid
2019-11-07T13:03:17.112+0000 I INDEX [conn11] build may temporarily use up to 500 megabytes of RAM
2019-11-07T13:03:17.112+0000 I INDEX [conn11] index build: collection scan done. scanned 0 total records in 0 seconds
2019-11-07T13:03:17.113+0000 I INDEX [conn11] index build: inserted 0 keys from external sorter into index in 0 seconds
2019-11-07T13:03:17.117+0000 I INDEX [conn11] index build: done building index transcripts.exons.id_1 on ns cellbase_hsapiens_grch37_v5.gene
2019-11-07T13:03:17.139+0000 I INDEX [conn11] index build: starting on cellbase_hsapiens_grch37_v5.gene properties: { v: 2, key: { transcripts.exons.chromosome: 1 }, name: "transcripts.exons.chromosome_1", ns: "cellbase_hsapiens_grch37_v5.gene", background: true } using method: Hybrid
And looking in mongo, the indexes look correct.
After discussion today with @imedina
IndexManager
in cellbase-libgetDatastore()
Currently, MongoDB indexes are created using the mongo client and these index files https://github.com/opencb/cellbase/tree/develop/cellbase-app/app/mongodb-scripts
This requires mongo client to be installed for indexing data which can be problematic in some environments like Docker. Index creation should be done programmatically as in OpenCGA, @pfurio can help here.
A possible implementation for this could be:
A possible improvement will be to define these indexes in a YAML file instead of TXT