x-atlas-consortia / ubkg-neo4j

A container implementation to serve the Unified Biomedical Knowledge Graph in Neo4j
MIT License
1 stars 0 forks source link

Remove index on DEF property of Definition nodes #46

Closed AlanSimmons closed 9 months ago

AlanSimmons commented 9 months ago

The v4 Docker image executes this command:

CREATE INDEX FOR (n:Definition) ON (n.DEF);

This command fails when the DEF property is too large. This is the case, for example, for the definition associated with UNIPROTKB:Q0821, which corresponds to the text in the "Function" section here: https://www.uniprot.org/uniprotkb/Q08211/entry

If any index should be created for the definition, it should be a full text one. For now, we will remove it.

yuanzhou commented 9 months ago

This is the build error encountered on DEV deployment:

ERROR [o.n.k.i.a.i.IndexPopulationJob] [ontology/213d5460] Failed to populate index: [Index( id=17, name='index_279fd474', type='GENERAL BTREE', schema=(:Definition {DEF}), indexProvider='native-btree-1.0' )]
java.lang.IllegalArgumentException: Property value is too large to index, please see index documentation for limitations. Index: Index( id=17, name='index_279fd474', type='GENERAL BTREE', schema=(:Definition {DEF}), indexProvider='native-btree-1.0' ), entity id: 4060635, property size: 8539, value: [String("FUNCTION: Multifunctional ATP-dependent RNA helicase (PubMed:17357160, PubMed:21589879, Pub....
    at org.neo4j.kernel.api.index.IndexValueValidator.throwSizeViolationException(IndexValueValidator.java:42) ~[neo4j-kernel-api-4.2.5.jar:4.2.5]
    at org.neo4j.kernel.impl.index.schema.GenericIndexKeyValidator.validate(GenericIndexKeyValidator.java:65) ~[neo4j-kernel-4.2.5.jar:4.2.5]
    at org.neo4j.kernel.impl.index.schema.BlockBasedIndexPopulator.storeUpdate(BlockBasedIndexPopulator.java:212) ~[neo4j-kernel-4.2.5.jar:4.2.5]
    at org.neo4j.kernel.impl.index.schema.BlockBasedIndexPopulator.storeUpdate(BlockBasedIndexPopulator.java:227) ~[neo4j-kernel-4.2.5.jar:4.2.5]
    at org.neo4j.kernel.impl.index.schema.BlockBasedIndexPopulator.add(BlockBasedIndexPopulator.java:203) ~[neo4j-kernel-4.2.5.jar:4.2.5]
    at org.neo4j.kernel.impl.api.index.MultipleIndexPopulator.lambda$flush$5(MultipleIndexPopulator.java:490) ~[neo4j-kernel-4.2.5.jar:4.2.5]
    at org.neo4j.kernel.impl.scheduler.ThreadPool.lambda$asCallable$1(ThreadPool.java:151) ~[neo4j-kernel-4.2.5.jar:4.2.5]
    at org.neo4j.kernel.impl.scheduler.ThreadPool.lambda$submit$0(ThreadPool.java:115) ~[neo4j-kernel-4.2.5.jar:4.2.5]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]

I'll remove CREATE INDEX FOR (n:Definition) ON (n.DEF); from the deployment codebase too https://github.com/x-atlas-consortia/ubkg-neo4j-v4/blob/main/neo4j/set_constraints.cypher