orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.76k stars 871 forks source link

Facing SEVER Magic number verification failed for page `0` of `xxxxx.pcl`. [OWOWCache] #8560

Closed ssenapat closed 5 years ago

ssenapat commented 6 years ago

OrientDB Version: 3.0.7

Java Version: 1.8

OS: Centos

RAM : 512 MB

Expected behavior

should work with out loosing vertexes, edges and the data.

Actual behavior

I was creating a new database model, creating vertexes, edges and inserting data. Suddenly OrientDB crashes and restarts. After restart I see the vertexes and the edges are lost.

The log show many verification failed messages. SEVER Magic number verification failed for page 0 of cvcontactprops.pcl. [OWOWCache]

Steps to reproduce

it had happened couple of times.

I am sure there is something is wrong. My guess is the RAM of 512 is causing the crash but not sure. I appreciate you help in this. Attaching

the log file.

andrii0lomakin commented 6 years ago

H @ssenapat , sure will try this case. Small question, do you create new classes during the addition of edges/ vertexes, or you create all schema before head?

ssenapat commented 6 years ago

Thanks Andrew,

The steps I followed are as follows. Using the GUI tool,

  1. Created vertex class. Specifically by clicking the new vertex button . I guess this creates a class derived from V.

  2. Added properties to the newly created Vertex class.

  3. Added records to the vertex.

  4. Created Edges Class. Specifically by clicking the new Edge button. I guess this creates a new class derived from E.

  5. Added properties to the newly created Edge class.

  6. Using the query created Edges between vertex and set the property values to. Edge. Ex: create edge e1 from ....

Regards, Srinivasa Senapathi.

On Mon, Sep 24, 2018 at 4:04 AM Andrey Lomakin notifications@github.com wrote:

H @ssenapat https://github.com/ssenapat , sure will try this case. Small question, do you create new classes during the addition of edges/ vertexes, or you create all schema before head?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/orientechnologies/orientdb/issues/8560#issuecomment-423913674, or mute the thread https://github.com/notifications/unsubscribe-auth/Ack5RfGOgClSzn-6KJPF64fmXbCddrGaks5ueKACgaJpZM4W178M .

ssenapat commented 6 years ago

Adding more info. Using the GUI options steps 1 to 9 are executed.

  1. Created Vertex Class CV extending V
  2. Added Properties to CV
  3. Created Vertex Class Props extending V
  4. Added Properties to Props
  5. Created Vertex Class ContactProps extending V
  6. added Properties to ContactProps
  7. Added Record to CV 7.Added Record to Props
  8. Created Edge Class cvHasProps extending E
  9. Added propertes to cvHasProps

using Query step 10 is executed

  1. Created Edge cvHasProps From (select from CV where @rid = #22: 0) To (select from Props) Set LanguageCode = "en"

using GUI options

  1. Created Edge class cvHasContactProps extending E
  2. Added properties to cvHasContactProprs

using Query step 13 is executed

  1. Create Edge cvHasContactProps From (select from CV where @rid = #22:0) To (Select from ContactProps) Set LanguageCode = "en"

using GUI options

  1. Create Vertex Class Organization Extending V
  2. Added properties to Organization
  3. Added records to Organization
  4. Created Edge Class Worked extending E
  5. Added properties to Worked

using query

  1. Create Edge From (select from CV where @rid = #22:0) to (Select from Organization where @rid= #xx:0) Set LanguageCode = "en"

followed similar pattern in creating few more edges and vertex.

On Mon, Sep 24, 2018 at 8:46 AM Srinivasa Senapathi < senapathi.srinivasa@gmail.com> wrote:

Thanks Andrew,

The steps I followed are as follows. Using the GUI tool,

  1. Created vertex class. Specifically by clicking the new vertex button . I guess this creates a class derived from V.

  2. Added properties to the newly created Vertex class.

  3. Added records to the vertex.

  4. Created Edges Class. Specifically by clicking the new Edge button. I guess this creates a new class derived from E.

  5. Added properties to the newly created Edge class.

  6. Using the query created Edges between vertex and set the property values to. Edge. Ex: create edge e1 from ....

Regards, Srinivasa Senapathi.

On Mon, Sep 24, 2018 at 4:04 AM Andrey Lomakin notifications@github.com wrote:

H @ssenapat https://github.com/ssenapat , sure will try this case. Small question, do you create new classes during the addition of edges/ vertexes, or you create all schema before head?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/orientechnologies/orientdb/issues/8560#issuecomment-423913674, or mute the thread https://github.com/notifications/unsubscribe-auth/Ack5RfGOgClSzn-6KJPF64fmXbCddrGaks5ueKACgaJpZM4W178M .

ssenapat commented 6 years ago

Few more points I have some edges with many properties. Does that cause any issue? example Organization Name City State StateCode Country CountryCode

The Worked Edge has below properties FromDate ToDate Title Description LangaugeCode Order Complete City State StateCode Country CountryCode

Edges created from CV to Organization

The Woks Edge has below propertes FromDate Title Description LangaugeCode Order Complete City State StateCode Country CountryCode

Edges created from CV to Organization

Institute Vertex Class with propeties Name City State StateCode Country CountryCode

Edge Studied Class with below propeties FromDate ToDate Degree Description City State StateCode Country CountryCode LangugaeCode Complete Order

Created Studied Edges from CV to Institute

Similarly Persuing Edge class

FromDate Degree Description City State StateCode Country CountryCode LangugaeCode Complete Order

Created Persuing Edges from CV to Institute

On Mon, Sep 24, 2018 at 10:33 AM Srinivasa Senapathi < senapathi.srinivasa@gmail.com> wrote:

Adding more info. Using the GUI options steps 1 to 9 are executed.

  1. Created Vertex Class CV extending V
  2. Added Properties to CV
  3. Created Vertex Class Props extending V
  4. Added Properties to Props
  5. Created Vertex Class ContactProps extending V
  6. added Properties to ContactProps
  7. Added Record to CV 7.Added Record to Props
  8. Created Edge Class cvHasProps extending E
  9. Added propertes to cvHasProps

using Query step 10 is executed

  1. Created Edge cvHasProps From (select from CV where @rid = #22: 0) To (select from Props) Set LanguageCode = "en"

using GUI options

  1. Created Edge class cvHasContactProps extending E
  2. Added properties to cvHasContactProprs

using Query step 13 is executed

  1. Create Edge cvHasContactProps From (select from CV where @rid =

    22:0) To (Select from ContactProps) Set LanguageCode = "en"

using GUI options

  1. Create Vertex Class Organization Extending V
  2. Added properties to Organization
  3. Added records to Organization
  4. Created Edge Class Worked extending E
  5. Added properties to Worked

using query

  1. Create Edge From (select from CV where @rid = #22:0) to (Select from Organization where @rid= #xx:0) Set LanguageCode = "en"

followed similar pattern in creating few more edges and vertex.

On Mon, Sep 24, 2018 at 8:46 AM Srinivasa Senapathi < senapathi.srinivasa@gmail.com> wrote:

Thanks Andrew,

The steps I followed are as follows. Using the GUI tool,

  1. Created vertex class. Specifically by clicking the new vertex button . I guess this creates a class derived from V.

  2. Added properties to the newly created Vertex class.

  3. Added records to the vertex.

  4. Created Edges Class. Specifically by clicking the new Edge button. I guess this creates a new class derived from E.

  5. Added properties to the newly created Edge class.

  6. Using the query created Edges between vertex and set the property values to. Edge. Ex: create edge e1 from ....

Regards, Srinivasa Senapathi.

On Mon, Sep 24, 2018 at 4:04 AM Andrey Lomakin notifications@github.com wrote:

H @ssenapat https://github.com/ssenapat , sure will try this case. Small question, do you create new classes during the addition of edges/ vertexes, or you create all schema before head?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/orientechnologies/orientdb/issues/8560#issuecomment-423913674, or mute the thread https://github.com/notifications/unsubscribe-auth/Ack5RfGOgClSzn-6KJPF64fmXbCddrGaks5ueKACgaJpZM4W178M .

andrii0lomakin commented 6 years ago

That is for sure, can not create a problem. @ssenapat I will look on this issue in a couple of days. Will need to fix another issue first.

ssenapat commented 6 years ago

I would like to add one more point.

The console is not starting up. It throws below exception. The Java version 1.8 is installed on the system. Is there any incompatibility issue with this version.

Exception in thread "main" java.lang.UnsupportedClassVersionError: com/orientechnologies/orient/console/OConsoleDatabaseApp : Unsupported major.minor version 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)

On Tue, Sep 25, 2018 at 4:12 AM Andrey Lomakin notifications@github.com wrote:

That is for sure, can not create a problem. @ssenapat https://github.com/ssenapat I will look on this issue in a couple of days. Will need to fix another issue first.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/orientechnologies/orientdb/issues/8560#issuecomment-424265725, or mute the thread https://github.com/notifications/unsubscribe-auth/Ack5Ray8E-GnUQh8cOGOTkR24GSAt3N7ks5uefOHgaJpZM4W178M .

gtadudeps commented 6 years ago

@laa any update on this, we are also facing this issue.

ssenapat commented 6 years ago

They could not reproduce this issue. I tried two times to upload new data and it crashed. I lost all the data that was uploaded. I am not worried because it was POC data. But it gave me a clear picture that it is too risky to use this product.

It would be disaster to the business, to loose the production data. So I am evaluating Amazon Neptune and Microsoft Cosmos DB.

On Wed, Oct 10, 2018 at 9:04 PM Piyush Mathur notifications@github.com wrote:

@laa https://github.com/laa any update on this, we are also facing this issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/orientechnologies/orientdb/issues/8560#issuecomment-428792622, or mute the thread https://github.com/notifications/unsubscribe-auth/Ack5RUsSlmPt6mERt6iErfPhn4aAoL5hks5ujqcbgaJpZM4W178M .

gtadudeps commented 6 years ago

@laa is there anything we can do to avoid this its mainly observed if the orientdb crashes/restarts. @ssenapat unfortunately we are too near to go prod with orientdb this issue is causing the project to go in risk.

andrii0lomakin commented 6 years ago

@gtadudeps if you provide me test which will reproduce the issue I will fix it quickly, as I wrote @ssenapat, right now we are busy with issues on commercial support. @ssenapat was not right that we can not reproduce the issue, right now we work on other issues, not on this one, but if you provide me test which will likely reproduce issue it will speed up fix of your problem.

nicolasembleton commented 6 years ago

Confirmed that we are also seeing this in 3.0.8 and it's quite unfortunate.

I can't provide much to test as I'm not sure why it happens but we had to renew a lot of our servers as we had a few issues and had them synced over time. Also we have updated the writeQuorum from 3 to 1 quite painfully.

2018-10-11 13:39:53:076 SEVER Magic number verification failed for page 37842 of XXX.pcl

That's about all we can see. This is pretty serious. Everything syncs back in distributed mode then this happens randomly.

We have 12GB of RAM, distributed, 3 servers well synced and started, working for a couple hours (after a long downtime) and then it starts to happen and then it doesn't get out of it.

Note that aside the previous resync, no downtime happened, no crash, we always use shutdown.sh to stop a server, etc... Using it pretty properly.

nicolasembleton commented 6 years ago

Switching off the WAL (as recommended by the doc now we use SSDs) brings an WAL is unavailable, unable to restore error. It's a bit of a bummer.

gtadudeps commented 6 years ago

@nicolasembleton wouldn't switching off WAL causes reliability issues and is too risky by itself to ensure data consistency. Moreover, is it solving this issue?

gtadudeps commented 6 years ago

@laa this issue is quite random so no fixed steps to reproduce it, but one way could be to continuously perform write operations and intermittently restart orientdb. Please take it on high priority as we need to clear this to go beta, we were also pushing the management to procure enterprise edition of orientdb but this severely weakens our case.

andrii0lomakin commented 6 years ago

Hi, we have made several changes to improve durability system, could you try on 3.0.9?

nicolasembleton commented 6 years ago

@gtadudeps it is solving the magic number issue when using an SSD yes. It is the recommended setting when using an SSD so I'd assume it's still ok (although we are going to switch back to HDD as WAL is quite important indeed).

@laa that's great. Thanks for the heads up. We are going to test that out.

andrii0lomakin commented 6 years ago

@nicolasembleton it is strongly not recommended to switch off WAL. Looking forward to your feedback.

nicolasembleton commented 6 years ago

@laa I think the doc should be updated because on this page: https://orientdb.com/docs/last/Write-Ahead-Log.html it says If you have a SSD we suggest to use for database files only, not WAL..

update: Reading it back I think I understand the ambiguity that I may have missed. This piece said Don't use SSD for WAL, use normal HDD for WALs instead of If you have an SSD, don't use WAL. It's written in an ambiguous way.

209 commented 5 years ago

We have this problem now. VPS, 1600Gb ssd (only ssd), 60GB RAM, 10 cores. We use OrientDB in docker container, v3.0.15.

In log: "Magic number verification failed for page 9198". Now we only added many records (with duplicates - treated adding error). And often start/stop container. Now: Vertex: 6 millions, Edges: 30 millions. We want increase count of Vertex to x20. Edges more.

But such problems very seriously inhibit the process.

andrii0lomakin commented 5 years ago

@209 could you send me stack trace which is printed when you see this exception?

andrii0lomakin commented 5 years ago

But please do load from the empty database.

andrii0lomakin commented 5 years ago

@209 I have provided small change, just to be sure that your problem is fixed in 3.0.16, we release it next week. Please try this distribution and provide me feedback with a stack trace if that happens again (hope not).

209 commented 5 years ago

@laa I see, OrientDB was updated, but docker container isn't. I can't try new version.

luigidellaquila commented 5 years ago

Hi @209

The pull request was submitted to Docker a few hours ago, but it takes some time to be approved. I'd suggest you to check again in next 24/48 hours

Thanks

Luigi