parseablehq / parseable

Open Source ElasticSearch Alternative. Parseable helps you search and get insights from your logs in the most simple way possible.
https://parseable.com
GNU Affero General Public License v3.0
1.91k stars 116 forks source link

called `Option::unwrap()` on a `None` value since 1.5.0 #928

Open ihiverlet opened 2 months ago

ihiverlet commented 2 months ago

Hi, Since upgrading to 1.5.0, we get the following error message:

parseable logs : called `Option::unwrap()` on a `None` value
thread 'actix-rt|system:0|arbiter:2' panicked at server/src/metadata.rs:368:52:
called `Option::unwrap()` on a `None` value
thread 'actix-rt|system:0|arbiter:3' panicked at server/src/metadata.rs:368:52:

It seems to be related to https://github.com/parseablehq/parseable/pull/892. It was working fine in v1.4.0. Here is an extract of the fluent-bit config we're using

     [OUTPUT]
          Name http
          Match ingress.*
          host parseable.parseable.svc.cluster.local
          http_User user
          http_Passwd password
          format json
          compress gzip
          port 80
          header Content-Type application/json
          header X-P-Stream ingress
          uri /api/v1/ingest
          json_date_key timestamp
          json_date_format iso8601
nitisht commented 2 months ago

Thanks for reporting @ihiverlet we'll fix this asap.

parmesant commented 2 months ago

Hey @ihiverlet Could you please provide us with some more information on-

You'll find these three things as part of the initial banner when you start the parseable process. It should look something like this-

image

Also, did you migrate both the Ingest and the Query servers from v1.4.0 to v1.5.0?

ihiverlet commented 2 months ago

Hello,

        Server Mode:        "Standalone"
        Version:            "v1.5.0"
        Commit:             "091377b"

Regarding your last question, I use the helm chart with the following configuration

parseable:
  parseable:
    local: false
    env:
      P_S3_TLS_SKIP_VERIFY: "true"
      P_PARQUET_COMPRESSION_ALGO: snappy
      P_OIDC_ISSUER: issuer
      P_OIDC_CLIENT_ID:  id
      P_OIDC_CLIENT_SECRET:  secret
      P_ORIGIN_URI: uri
    resources:
      limits:
        cpu: 4
        memory: 20Gi
      requests:
        cpu: 1
        memory: 4Gi
parmesant commented 2 months ago

Hey @ihiverlet The issue is due to the sequence in which the ingest and query nodes got upgraded.

Our suggestion is that you always upgrade query first and then upgrade ingest.

Thanks for bringing this issue forward, we will make sure to include more meaningful error messages wherever possible.

ihiverlet commented 2 months ago

Hey @parmesant,

I am using the helm chart with basic configuration, so I am in standalone mode. Hence, I do not understand what you mean by upgrading query node first and ingest later.

parmesant commented 2 months ago

I was able to reproduce this error in distributed mode but not in standalone mode. Could you please tell me how to reproduce this error? If you wish to, you could join our slack and carry the conversation over there.

smparekh commented 1 month ago

Facing the same issue Upgrade from 0.9.0 -> 1.5.3 in standalone mode on ECS, reverting to 0.9.0 also causes an error. Is there a specific version upgrade process we should do? 0.9.0 -> 1.0.0 -> ...

smparekh commented 1 month ago

Reverting to v0.9.0 results in:

Error: Could not start the server because bucket 'https://s3.xxx.amazonaws.com/xxx' contains stale data, please use an empty bucket and restart the server.
nitisht commented 1 month ago

hi @smparekh,

Reverting to v0.9.0 results in:

Error: Could not start the server because bucket 'https://s3.xxx.amazonaws.com/xxx' contains stale data, please use an empty bucket and restart the server.

Reverting will not work for sure. There are metadata migrations happening between these versions. We found out in case of @ihiverlet there was active data being ingested while upgrade was happening - is this the case here too?

smparekh commented 1 month ago

That may have been the case, I use ecs to deploy, so the old task isn’t fully shutdown until the new task is in place. For future upgrades we will make sure the old task is shutdown before the new task is spun up. Would that be sufficient?

nitisht commented 1 month ago

For future upgrades we will make sure the old task is shutdown before the new task is spun up. Would that be sufficient?

Yes, we'll also ensure the server doesn't accept events while performing migrations. This will be added in upcoming releases. Meanwhile are you able to use Parseable right now?

nikhilsinhaparseable commented 1 month ago

@smparekh would it be possible for you to have a call today, we can sort the issue quickly on the call

smparekh commented 1 month ago

For future upgrades we will make sure the old task is shutdown before the new task is spun up. Would that be sufficient?

Yes, we'll also ensure the server doesn't accept events while performing migrations. This will be added in upcoming releases. Meanwhile are you able to use Parseable right now?

We tested the update on a test environment, unfortunately weren't able to recover the data after the downgrade, we went ahead and deleted the bucket so we could revert to v0.9.0.

nitisht commented 1 month ago

In that case let's have a short call @smparekh to ensure data is properly recovered and you're able to run the latest version. Would you please schedule something here: https://logg.ing/quick-chat