orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.73k stars 869 forks source link

Unable to repair corrupted class #9727

Open mmariuzzo opened 2 years ago

mmariuzzo commented 2 years ago

OrientDB Version: 3.0.37

Java Version: Oracle JDK 1.8.0_251

OS: Linux CentOS 7

I'm using OrientDB to store IoT metrics. Because I've a high ingestion ratio, I've configure it to switch in read-only mode when storage free space is below 4GB via "storage.diskCache.diskFreeSpaceLimit" property. Despite this when the DB operate the switch a table (a class) becomes corrupted. After adding more space and restarting server, trying to read the last 10 inserted records an exception is logged (the full stack is attached at the end)

[OLocalPaginatedStorage]Exception `21D9DF3B` in storage `plocal:/wos1/orientdb/databases/iot_exp_26a`: 3.0.37 - Veloce (build 6a0e4724c10d51a0b19700fca46da8e41ae006f5, branch 3.0.x)
java.lang.NullPointerException
       at com.orientechnologies.orient.core.storage.cache.chm.AsyncReadCache.releaseFromRead(AsyncReadCache.java:290)
       at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.releasePageFromRead(ODurableComponent.java:167)
       at com.orientechnologies.orient.core.storage.cluster.v1.OPaginatedClusterV1.internalReadRecord(OPaginatedClusterV1.java:562)
       at com.orientechnologies.orient.core.storage.cluster.v1.OPaginatedClusterV1.readRecord(OPaginatedClusterV1.java:534)
       at com.orientechnologies.orient.core.storage.cluster.v1.OPaginatedClusterV1.readRecord(OPaginatedClusterV1.java:517)
       at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doReadRecord(OAbstractPaginatedStorage.java:5472)
       at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:2179)

While broken record is skipped on select, a NPE is fired when I try to insert a new record into the table.

CHECK DATABASE command reported no problem. I have however exec REPAIR DATABASE command with no luck.

Corrupted table is a simple class (it doesn't extends vertex or edge) and have mainly simple properties: string, long, embeddedMap of string.

I've also tried to attach the database to latest OrientDB 3.1.x and 3.2.x but no one is able to fix the table.

Table is using a single cluster.

The only way to bypass the problem was a full database export/import into a new one. This action takes hours because the database contains millions of records.

Expected behavior

Have the table corruption detected and repaired by CHECK DATABASE and REPAIR DATABASE commands.

The ideal solution will be to have a CHECK CLASS and REPAIR CLASS command to focus on the problematic class and save time (other classes could be bigger than the corrupted one)

Actual behavior

CHECK and REPAIR commands doesn't identify the problem and the table stay corrupted.

Steps to reproduce

I'm unable to programmatically replicate the problem in a newly created database. I've a full copy of the corrupted one I can use to test a fix against.

orientdb-error.txt

suneelkumarch commented 2 months ago

I too have the similar problem. On a restart of system(k8s cluster). OrientDB crashed while loading OSystem DB. To recover, have deleted "OSystem" database and restarted OrientDB. OrientDB started but then it failed to load my database. with following error

 SEVER Exception `<ID>` in storage `plocal:/orientdb/databases/OSystem`: 3.2.18 (build 75890139e2e64b786a59c95b913af9fbb86c5cfc, branch UNKNOWN) [OLocalPaginatedStorage]
com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of atomic operation inside of storage OSystem
    at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.executeInsideAtomicOperation(OAtomicOperationsManager.java:146)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:531)
    at com.orientechnologies.orient.core.db.OrientDBEmbedded.getAndOpenStorage(OrientDBEmbedded.java:590)
    at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:517)
    at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:87)
    at com.orientechnologies.orient.core.db.OSystemDatabase.openSystemDatabase(OSystemDatabase.java:86)
    at com.orientechnologies.orient.core.db.OSystemDatabase.checkServerId(OSystemDatabase.java:165)
    at com.orientechnologies.orient.core.db.OSystemDatabase.init(OSystemDatabase.java:153)
    at com.orientechnologies.orient.server.OServer.initSystemDatabase(OServer.java:1147)
    at com.orientechnologies.orient.server.OServer.activate(OServer.java:430)
    at com.orientechnologies.orient.server.OServerMain$1.run(OServerMain.java:49)
Caused by: java.lang.NullPointerException

I tried CHECK DATABASE it did not detect any problems. REPAIR DATABASE --fix-graph -v failed with following error

orientdb {db=mydb}> repair database --fix-graph -v
Repair of graph 'plocal:/orientdb/bin/../databases/mydb' is started ...
Scanning 19950 edges (skipEdges=0)...

Error: java.lang.NullPointerException

orientdb {db=mydb}>

Any hit to recover database?