neo4j / neo4j

Graphs for Everyone
http://neo4j.com
GNU General Public License v3.0
13.15k stars 2.37k forks source link

NOT PART OF CHAIN! #12639

Closed flpgrz closed 3 years ago

flpgrz commented 3 years ago

Neo4j version: 4.1.1 (community edition) Operating system: DEBIAN 7 API/Driver: Cypher/Python API/Browser/Desktop (I get the same error everywhere)

Steps to reproduce: I created several csv files to run neo4j-admin import. I can't share the files unfortunately. The import runs successfully without errors. In the graph there are: 8 different node labels, 3 relationship types, 700k nodes, 400M relationships. There's a specific relationship csv file (7 GB), containing 399M relationships out of the total ~400M, which leads to this issue. If I don't include that file, everything works fine.

When I operate a simple query e.g. match path = (h)-[r]->(t) where (('Gene' in labels(h)) and ('Gene' in labels(t))) or (('Gene' in labels(h)) and ('Variant' in labels(t))) return id(h), id(t)

I obtain the following error:

NOT PART OF CHAIN! RelationshipTraversalCursor[id=292159681, open state with: denseNode=true, next=292159681, , underlying record=Relationship[292159681,used=true,source=137047,target=271310,type=0,sPrev=292159682,sNext=292159680,tPrev=292196972,tNext=292152117,prop=295162289,!sFirst,!tFirst]]

I also run a consistency check and obtained: Consistency check 2020-12-06 16:35:51.426+0000 ERROR [o.n.c.ConsistencyCheckService] The next property record does not have this record as its previous record. Property[1756984,used=true,prev=-1,next=1703993, (blocks not loaded)] Inconsistent with: Property[1703993,used=true,prev=1703992,next=1703994, (blocks not loaded)] 2020-12-06 16:35:51.446+0000 ERROR [o.n.c.ConsistencyCheckService] The referenced property record is not the first in its property chain. RecordNodeCursor[id=201948, open state with: highMark=-1, next=-1, underlying record=Node[201948,used=true,created=false,group=202324,prop=1757084,labels=Inline(0x2000180001:[1, 6]),light]] Inconsistent with: Property[1757084,used=true,prev=4294902015,next=1757085, (blocks not loaded)] 2020-12-06 16:35:51.465+0000 ERROR [o.n.c.ConsistencyCheckService] The next property record does not have this record as its previous record. Property[2300852,used=true,prev=2300851,next=2300853, (blocks not loaded)] Inconsistent with: Property[2300853,used=true,prev=7092,next=-1, (blocks not loaded)] 2020-12-06 16:35:51.764+0000 ERROR [o.n.c.ConsistencyCheckService] The property chain contains multiple properties that have the same property key id, which means that the entity has at least one duplicate property. RecordNodeCursor[id=311733, open state with: highMark=-1, next=-1, underlying record=Node[311733,used=true,created=false,rel=399531100,prop=1286730,labels=Inline(0x2000100002:[2, 4]),light]] 2020-12-06 16:35:51.765+0000 ERROR [o.n.c.ConsistencyCheckService] The next property record does not have this record as its previous record. Property[1290137,used=true,prev=1290136,next=1290140, (blocks not loaded)] Inconsistent with: Property[1290140,used=true,prev=1245337,next=1290141, (blocks not loaded)] Consistency checking failed.Full consistency check did not complete

Can someone please explain what this means? Does it imply something goes wrong in the import? If yes, I'd expect an error message in the import process.

tinwelint commented 3 years ago

To me it looks like some issue with the importer, no idea of what yet. Could you share the whole command line, i.e. all the arguments to the import command?

flpgrz commented 3 years ago

Hello @tinwelint ,

Command:

neo4j-admin import --database neo4j \
--nodes=/var/lib/neo4j/import/folder/pre_processed/genes.csv \
--nodes=/var/lib/neo4j/import/folder/pre_processed/variants.csv \
--nodes=/var/lib/neo4j/import/folder/pre_processed/cells.csv \
--nodes=/var/lib/neo4j/import/folder/pre_processed/patients.csv \
--relationships=GENE_GENE=/var/lib/neo4j/import/folder/pre_processed/gene_gene.csv \
--relationships=GENE_VARIANT=/var/lib/neo4j/import/folder/pre_processed/gene_variant.csv \
--relationships=CELL_GENE=/var/lib/neo4j/import/folder/pre_processed/cell_gene.csv \
--relationships=CELL_GENE=/var/lib/neo4j/import/folder/pre_processed/pre_post_cell_gene.csv \
--relationships=PATIENT_CELL=/var/lib/neo4j/import/folder/pre_processed/patient_cell.csv \
--skip-duplicate-nodes

I think I can actually share the files, if you're interested. Currently clarifying this point.

tinwelint commented 3 years ago

Thanks. Yes since I haven't seen this before it would be very useful to try to import it in a debugger locally!

tinwelint commented 3 years ago

@flpgrz Not sure, but you may have run into an issue, which was fixed in 4.1.2 with the description:

Fixes an ID allocation issue when importing a large amount of relationships so that double-record-unit allocation got enabled. This could result in some relationships getting one of their record units overwritten

Could you try on latest 4.1 instead (4.1.5 at the time of writing this)?

flpgrz commented 3 years ago

@tinwelint, I had some delays with sharing the files, sorry.

I installed the latest release (4.2 community edition). The problem is not occurring anymore. Also, I installed neo4j this time in a machine with a much bigger RAM. So there might be 2 possible causes that lead to the resolution:

tinwelint commented 3 years ago

Allright, glad to hear. Cheers!