Closed flpgrz closed 3 years ago
To me it looks like some issue with the importer, no idea of what yet. Could you share the whole command line, i.e. all the arguments to the import command?
Hello @tinwelint ,
Command:
neo4j-admin import --database neo4j \
--nodes=/var/lib/neo4j/import/folder/pre_processed/genes.csv \
--nodes=/var/lib/neo4j/import/folder/pre_processed/variants.csv \
--nodes=/var/lib/neo4j/import/folder/pre_processed/cells.csv \
--nodes=/var/lib/neo4j/import/folder/pre_processed/patients.csv \
--relationships=GENE_GENE=/var/lib/neo4j/import/folder/pre_processed/gene_gene.csv \
--relationships=GENE_VARIANT=/var/lib/neo4j/import/folder/pre_processed/gene_variant.csv \
--relationships=CELL_GENE=/var/lib/neo4j/import/folder/pre_processed/cell_gene.csv \
--relationships=CELL_GENE=/var/lib/neo4j/import/folder/pre_processed/pre_post_cell_gene.csv \
--relationships=PATIENT_CELL=/var/lib/neo4j/import/folder/pre_processed/patient_cell.csv \
--skip-duplicate-nodes
I think I can actually share the files, if you're interested. Currently clarifying this point.
Thanks. Yes since I haven't seen this before it would be very useful to try to import it in a debugger locally!
@flpgrz Not sure, but you may have run into an issue, which was fixed in 4.1.2 with the description:
Fixes an ID allocation issue when importing a large amount of relationships so that double-record-unit allocation got enabled. This could result in some relationships getting one of their record units overwritten
Could you try on latest 4.1 instead (4.1.5 at the time of writing this)?
@tinwelint, I had some delays with sharing the files, sorry.
I installed the latest release (4.2 community edition). The problem is not occurring anymore. Also, I installed neo4j this time in a machine with a much bigger RAM. So there might be 2 possible causes that lead to the resolution:
Allright, glad to hear. Cheers!
Neo4j version: 4.1.1 (community edition) Operating system: DEBIAN 7 API/Driver: Cypher/Python API/Browser/Desktop (I get the same error everywhere)
Steps to reproduce: I created several csv files to run
neo4j-admin import
. I can't share the files unfortunately. The import runs successfully without errors. In the graph there are: 8 different node labels, 3 relationship types, 700k nodes, 400M relationships. There's a specific relationship csv file (7 GB), containing 399M relationships out of the total ~400M, which leads to this issue. If I don't include that file, everything works fine.When I operate a simple query e.g.
match path = (h)-[r]->(t) where (('Gene' in labels(h)) and ('Gene' in labels(t))) or (('Gene' in labels(h)) and ('Variant' in labels(t))) return id(h), id(t)
I obtain the following error:
NOT PART OF CHAIN! RelationshipTraversalCursor[id=292159681, open state with: denseNode=true, next=292159681, , underlying record=Relationship[292159681,used=true,source=137047,target=271310,type=0,sPrev=292159682,sNext=292159680,tPrev=292196972,tNext=292152117,prop=295162289,!sFirst,!tFirst]]
I also run a consistency check and obtained:
Consistency check 2020-12-06 16:35:51.426+0000 ERROR [o.n.c.ConsistencyCheckService] The next property record does not have this record as its previous record. Property[1756984,used=true,prev=-1,next=1703993, (blocks not loaded)] Inconsistent with: Property[1703993,used=true,prev=1703992,next=1703994, (blocks not loaded)] 2020-12-06 16:35:51.446+0000 ERROR [o.n.c.ConsistencyCheckService] The referenced property record is not the first in its property chain. RecordNodeCursor[id=201948, open state with: highMark=-1, next=-1, underlying record=Node[201948,used=true,created=false,group=202324,prop=1757084,labels=Inline(0x2000180001:[1, 6]),light]] Inconsistent with: Property[1757084,used=true,prev=4294902015,next=1757085, (blocks not loaded)] 2020-12-06 16:35:51.465+0000 ERROR [o.n.c.ConsistencyCheckService] The next property record does not have this record as its previous record. Property[2300852,used=true,prev=2300851,next=2300853, (blocks not loaded)] Inconsistent with: Property[2300853,used=true,prev=7092,next=-1, (blocks not loaded)] 2020-12-06 16:35:51.764+0000 ERROR [o.n.c.ConsistencyCheckService] The property chain contains multiple properties that have the same property key id, which means that the entity has at least one duplicate property. RecordNodeCursor[id=311733, open state with: highMark=-1, next=-1, underlying record=Node[311733,used=true,created=false,rel=399531100,prop=1286730,labels=Inline(0x2000100002:[2, 4]),light]] 2020-12-06 16:35:51.765+0000 ERROR [o.n.c.ConsistencyCheckService] The next property record does not have this record as its previous record. Property[1290137,used=true,prev=1290136,next=1290140, (blocks not loaded)] Inconsistent with: Property[1290140,used=true,prev=1245337,next=1290141, (blocks not loaded)] Consistency checking failed.Full consistency check did not complete
Can someone please explain what this means? Does it imply something goes wrong in the import? If yes, I'd expect an error message in the import process.