orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.73k stars 869 forks source link

Upgrade from Orient DB v3.0.35 to v3.2.2: Id of WAL operation cannot be duplicated #9542

Closed jamieb22 closed 2 years ago

jamieb22 commented 3 years ago

OrientDB Version: 3.1.8

Java Version: v8

OS: Microsoft Windows [Version 10.0.19041.804]

What was done

Created a database using Orient DB V3.0.35 and inserted records. When upgrade Orient DB to V3.1.8, the error message " Id of WAL operation cannot be duplicated" is outputted in the console.

Actual behaviour

com.orientechnologies.orient.core.exception.ODatabaseException: Cannot open database 'archiva' at com.orientechnologies.orient.core.db.OrientDBEmbedded.open(OrientDBEmbedded.java:490) at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:928) at com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPooled.internalOpen(OPartitionedDatabasePool.java:437) at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.openDatabase(OPartitionedDatabasePool.java:304) at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.acquire(OPartitionedDatabasePool.java:259) at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.(OrientBaseGraph.java:178) at com.tinkerpop.blueprints.impls.orient.OrientTransactionalGraph.(OrientTransactionalGraph.java:82) at com.tinkerpop.blueprints.impls.orient.OrientGraph.(OrientGraph.java:125) at com.tinkerpop.blueprints.impls.orient.OrientGraphFactory$1.getGraph(OrientGraphFactory.java:92) at com.tinkerpop.blueprints.impls.orient.OrientGraphFactory.getTx(OrientGraphFactory.java:242) at .. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: Id of WAL operation can not be duplicated at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.fetchNextOperationId(OAbstractPaginatedStorage.java:6502) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFromBeginning(OAbstractPaginatedStorage.java:6221) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFromWAL(OAbstractPaginatedStorage.java:6161) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.recoverIfNeeded(OAbstractPaginatedStorage.java:5320) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:401) at com.orientechnologies.orient.core.db.OrientDBEmbedded.getAndOpenStorage(OrientDBEmbedded.java:500) at com.orientechnologies.orient.core.db.OrientDBEmbedded.open(OrientDBEmbedded.java:479) ... 19 common frames omitted

Expected behavior

Expected WAL to be restored on DB start

Steps to reproduce

Attempt to start attached DB in Orient DB v3.1.8. The DB was originally created using Orient Db 3.0.35.

archiva.zip

fpavlov-sap commented 3 years ago

Same problem happen to me today. Initially DB was created on 3.0.37 Now 3.1.8 produces this error

smart360 commented 3 years ago

Same here. Any plan to fix this?

ghost commented 3 years ago

I have the same issue with 2 different databases where I have been running the 3.1.x since 3.1.4. I have updated to 3.1.9 from 3.1.7 and I am seeing the issues:

Exception 05F8ADEF in storage plocal:/data/databases/db_name: 3.1.9 - Veloce (build a89e19fbd64a74f9166e889a4b5fb8017f4c64b7, branch 3.1.x) java.lang.IllegalStateException: Id of WAL operation can not be duplicated at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.fetchNextOperationId(OAbstractPaginatedStorage.java:6502) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFromBeginning(OAbstractPaginatedStorage.java:6221) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFromWAL(OAbstractPaginatedStorage.java:6161) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.recoverIfNeeded(OAbstractPaginatedStorage.java:5320) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:401) at com.orientechnologies.orient.core.db.OrientDBEmbedded.getAndOpenStorage(OrientDBEmbedded.java:500) at com.orientechnologies.orient.core.db.OrientDBEmbedded.open(OrientDBEmbedded.java:479) at com.orientechnologies.orient.core.db.OrientDBDistributed.open(OrientDBDistributed.java:207) at com.orientechnologies.orient.core.db.OrientDBEmbedded.open(OrientDBEmbedded.java:419) at com.orientechnologies.orient.core.db.OrientDBDistributed.open(OrientDBDistributed.java:200) at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:979) at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:942) at com.orientechnologies.orient.server.network.protocol.http.command.OServerCommandAuthenticatedDbAbstract.authenticate(OServerCommandAuthenticatedDbAbstract.java:193) at com.orientechnologies.orient.server.network.protocol.http.command.OServerCommandAuthenticatedDbAbstract.beforeExecute(OServerCommandAuthenticatedDbAbstract.java:135) at com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetConnect.beforeExecute(OServerCommandGetConnect.java:55) at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.service(ONetworkProtocolHttpAbstract.java:250) at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.execute(ONetworkProtocolHttpAbstract.java:811) at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:67)

ghost commented 3 years ago

Hello, just an update on this issue. I rollback to 3.1.8 and I was still have the same issue. I rollback to 3.1.7 and I am now able to connect and run. On my first connection with 3.1.7, I saw got the output:

2021-03-14 17:07:30:263 WARNI Storage 'db_name' was not closed properly. Will try to recover from write ahead log [OLocalPaginatedStorage] 2021-03-14 17:07:30:263 INFO Looking for last checkpoint... [OLocalPaginatedStorage] 2021-03-14 17:07:30:264 INFO Data restore procedure is started. [OLocalPaginatedStorage]Data restore was paused because of exception. The rest of changes will be rolled back. java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:275) at com.orientechnologies.orient.core.storage.impl.local.paginated.wal.cas.CASDiskWriteAheadLog.checkPageIsBrokenAndDecrypt(CASDiskWriteAheadLog.java:873) at com.orientechnologies.orient.core.storage.impl.local.paginated.wal.cas.CASDiskWriteAheadLog.readFromDisk(CASDiskWriteAheadLog.java:616) at com.orientechnologies.orient.core.storage.impl.local.paginated.wal.cas.CASDiskWriteAheadLog.read(CASDiskWriteAheadLog.java:528) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFrom(OAbstractPaginatedStorage.java:6211) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFromBeginning(OAbstractPaginatedStorage.java:6185) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFromWAL(OAbstractPaginatedStorage.java:6141) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.recoverIfNeeded(OAbstractPaginatedStorage.java:5297) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:404) at com.orientechnologies.orient.core.db.OrientDBEmbedded.getAndOpenStorage(OrientDBEmbedded.java:500) at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthenticate(OrientDBEmbedded.java:429) at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthenticate(OrientDBEmbedded.java:79) at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:976) at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:942) at com.orientechnologies.orient.server.network.protocol.http.command.OServerCommandAuthenticatedDbAbstract.authenticate(OServerCommandAuthenticatedDbAbstract.java:193) at com.orientechnologies.orient.server.network.protocol.http.command.OServerCommandAuthenticatedDbAbstract.beforeExecute(OServerCommandAuthenticatedDbAbstract.java:135) at com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetConnect.beforeExecute(OServerCommandGetConnect.java:55) at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.service(ONetworkProtocolHttpAbstract.java:250) at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.execute(ONetworkProtocolHttpAbstract.java:811) at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:67)

I was running with 3.1.9 for a few days before I ran into any issues.

My other databases that I was having the WAL operaiton exception in 3.1.9, when I connected with 3.1.7, There was no log entries. Everything was just working.

holocentric-bmsnext commented 3 years ago

We've got the same issue with 3.1.10 as well when upgrading from 3.1.7 to 3.1.10 (tried with 3.1.8 also same issue)

2021-04-21 05:03:35:498 INFO Page size for WAL located in /orientdb/orientdb-3.1.10/databases/7815fd8e-6009-45e8-93be-58a311d73f16 is set to 4096 bytes. [CASDiskWriteAheadLog]Exception69CE9B6Cin storageplocal:/orientdb/orientdb-3.1.10/databases/7815fd8e-6009-45e8-93be-58a311d73f16: 3.1.10 - Veloce (build 8921c4c075030f2c351cca85a3047aaed44f464c, branch 3.1.x) java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at com.orientechnologies.orient.core.storage.impl.local.paginated.wal.cas.CASDiskWriteAheadLog.checkPageIsBrokenAndDecrypt(CASDiskWriteAheadLog.java:957) at com.orientechnologies.orient.core.storage.impl.local.paginated.wal.cas.CASDiskWriteAheadLog.extractLastOperationId(CASDiskWriteAheadLog.java:452) at com.orientechnologies.orient.core.storage.impl.local.paginated.wal.cas.CASDiskWriteAheadLog.<init>(CASDiskWriteAheadLog.java:318) at com.orientechnologies.orient.core.storage.disk.OLocalPaginatedStorage.initWalAndDiskCache(OLocalPaginatedStorage.java:788) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:374) at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$loadAllDatabases$4(OrientDBEmbedded.java:856) at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$scanDatabaseDirectory$6(OrientDBEmbedded.java:978) at java.lang.Iterable.forEach(Iterable.java:75) at com.orientechnologies.orient.core.db.OrientDBEmbedded.scanDatabaseDirectory(OrientDBEmbedded.java:967) at com.orientechnologies.orient.core.db.OrientDBEmbedded.loadAllDatabases(OrientDBEmbedded.java:850) at com.orientechnologies.orient.server.OServer.loadDatabases(OServer.java:662) at com.orientechnologies.orient.server.OServer.activate(OServer.java:492) at com.orientechnologies.orient.server.OServerMain$1.run(OServerMain.java:49)

jamieb22 commented 3 years ago

Andrey, I am not sure if you noticed yet, but there are three bug reports on the same issue. This issue appears bigger than just a corrupted database. Would you consider investigating this?

kdima001 commented 3 years ago

OS: Windows Server 2016 and windows 10. I have this error when upgrading the version from 3.1.6 to 3.1.10. When returning to 3.1.6, the database starts without an error, and working fine.

andrii0lomakin commented 3 years ago

Hi @kdima001 could you provide your database for testing? I am planning to finish work on my current issue in 2 weeks then I can help you with your issue if you provide me with the database. Because it is really hard to understand reason without the test case.

kdima001 commented 3 years ago

Yes, I can provide the DB. It is a test version and does not contain any confidential data. The database is used by CMS Genetics Mesh. odb.zip

andrii0lomakin commented 3 years ago

Hi @kdima001. I have checked your database it was created at another version and then was incorrectly closed. Then you started to use it with a newer version and have got this error. It was never supported to open crashed database at the version different from which it was created. Unfornutelly I have never seen other reasons for this problem if they provided I will be happy to fix them. As for the situation when the database is crashed under the one version but used under another version I will add safety mechanics which will prevent such situation in future.

kdima001 commented 3 years ago

Андрей, здравствуйте. База данных действительно была создана из старой версии. При этом в этой версии она открывается. В логах сообщений про сбой не вижу. Новая версия сообщает о проблеме. Мне это показалось странным.

Отправлено из Mail.ru для Android вторник, 29 июня 2021г., 13:12 +03:00 от Andrey Lomakin @.*** :

Hi @kdima001 . I have checked your database it was created at another version and then was incorrectly closed. Then you started to use it with a newer version and have got this error. It was never supported to open crashed database at the version different from which it was created. Unfornutelly I have never seen other reasons for this problem if they provided I will be happy to fix them. As for the situation when the database is crashed under the one version but used under another version I will add safety mechanics which will prevent such situation in future. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .

jamieb22 commented 3 years ago

Andrey. I also find it hard to accept that the issue is due to an incorrectly closed database. I've seen this issue occurs on many occasions. Are there any tests that test the upgrade of Orient DB? If not, perhaps, we could incorporate one into the build process, since upgrading the database is frequently fraught with problems, and sometimes results in catastrophic data loss.

andrii0lomakin commented 3 years ago

@jamieb22 I have opened your database under the older version and then after the newest one and it was opened without problems. @jamieb22 we never supported update of crashed databases and have never seen single database which was properly closed and then updated with listed problems. If you provide me any database which does not feet those conditions then I will be glad to fix reported problem. As for now I have not seen evidence of other problems.

jamieb22 commented 2 years ago

I can confirm that this bug still exists when attempting to open a database created by 3.0.39 using Orient DB 3.2.2. Stack trace outputted below & sample database. I've verified that I am indeed running Orient DB v3.2.2 libs.

image

com.orientechnologies.orient.core.exception.ODatabaseException: Cannot open database 'archiva'
    at com.orientechnologies.orient.core.db.OrientDBEmbedded.open(OrientDBEmbedded.java:505)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:928)
    at com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPooled.internalOpen(OPartitionedDatabasePool.java:437)
    at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.openDatabase(OPartitionedDatabasePool.java:304)
    at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.acquire(OPartitionedDatabasePool.java:259)
    at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.<init>(OrientBaseGraph.java:178)
    at com.tinkerpop.blueprints.impls.orient.OrientTransactionalGraph.<init>(OrientTransactionalGraph.java:82)
    at com.tinkerpop.blueprints.impls.orient.OrientGraph.<init>(OrientGraph.java:125)
    at com.tinkerpop.blueprints.impls.orient.OrientGraphFactory$1.getGraph(OrientGraphFactory.java:92)
    at com.tinkerpop.blueprints.impls.orient.OrientGraphFactory.getTx(OrientGraphFactory.java:242)
    ..
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Id of WAL operation can not be duplicated
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.fetchNextOperationId(OAbstractPaginatedStorage.java:6442)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFromBeginning(OAbstractPaginatedStorage.java:6161)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.restoreFromWAL(OAbstractPaginatedStorage.java:6101)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.recoverIfNeeded(OAbstractPaginatedStorage.java:5269)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:392)
    at com.orientechnologies.orient.core.db.OrientDBEmbedded.getAndOpenStorage(OrientDBEmbedded.java:543)
    at com.orientechnologies.orient.core.db.OrientDBEmbedded.open(OrientDBEmbedded.java:494)
    ... 19 common frames omitted

archiva.zip

andrii0lomakin commented 2 years ago

Hi @jamieb22 , this bug is not fixed yet (lat fix was reverted), I am working on its fix right now will publish an update once it is done.

andrii0lomakin commented 2 years ago

Fixed both in 3.1 and 3.2