grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.64k stars 3.42k forks source link

migration logs from one storage type to another #5903

Open piostelmach opened 2 years ago

piostelmach commented 2 years ago

Hi,

Is it possible to migrate old loki logs, from one storage for example local vms storage to azure blob ? I can't find any information in official documentation on loki site :/

Best regards, Piotr

slim-bean commented 2 years ago

It is possible, there is a post here that talks about specifically moving from filesystem to object storage

I can tell you though that recently it was brought to my attention that the delimeter in the blob storage wasn't a : and was instead a -, only because someone opened a PR to make this configurable This isn't in a release yet so you'll either need to run from main or map the : to - when you move the files.

One additional consideration, that original article was created prior to the v12 schema which changes the storage layout a little, so if you are using v12 you would need to do things a little different than that post.

Another option is the migrate tool however as the big warning on that page says, we don't use this much or at all, but you can give it a try it may work just fine, given that it won't modify the source data there is really no risk in trying it. (if you do please report back with your results)

piostelmach commented 2 years ago

@slim-bean thank you for the tips. I'll check it and give some feedback :)

Do you know why on the local filesystem some of chunks has single quotation mark ' in the name ? For example:

'ZmFrZS9jODU1NTc2YzUxMGM1NDM6MTgwNDViMDJiNTQ6MTgwNDViMmU1N2U6YTZkMTAwOTg=' ZmFrZS8zODAwZGRiNDMwZmFmMDBhOjE4MDQ1YjI5ZjU3OjE4MDQ1YjI5ZjU4OmFkMjNkYjlk ZmFrZS9lYjM5OGQ0NjFmMjY0ZjE6MTgwNDViMmIxMzQ6MTgwNDViMmM1YTA6YTIyZTA0 ZmFrZS9lYjg3ZjkwNTU3Njg0Mjk1OjE4MDQ1YjI5Yjg1OjE4MDQ1YjI5Yjg2OmQ1N2FlZjUz ZmFrZS9jMzk5YmZkNDQwYjEwNjEzOjE4MDQ1YjI5YjZmOjE4MDQ1YjI5YjcwOmI3NjcwMWRl ZmFrZS9mMWMwYjk0MWI2N2I2NTcxOjE4MDQ1YjJiZjZhOjE4MDQ1YjJiZjZiOjNkMTZmMmE2 ZmFrZS8xOWI4MjQ0MWZmNWEzNWY4OjE4MDQ1YjMwNmFlOjE4MDQ1YjMwNmFmOmI4MDc3MTI4 ZmFrZS83ZjdlZmZjMzY1NDIyNTA5OjE4MDQ1YjFiYWQyOjE4MDQ1YjM3MTczOmVjYjFjN2U5 ZmFrZS9lYzQzN2JkOGRjMmZkMTA0OjE4MDQ1NGIzNDg1OjE4MDQ1YjkxMThmOmNlZmM3MWE0 ZmFrZS9iNmYzN2E4ODliOTVkZjI0OjE4MDQ1YjM2NWUwOjE4MDQ1YjRjZDI3OjVmOTJhMzEy ZmFrZS9iNjdiMzFlZmRiYTUwOTE5OjE4MDQ1YjQ1NjY4OjE4MDQ1YjUzNmQ3OjNkMGUyNWU3 'ZmFrZS85NGY2NmRmMmRmMjYxODJhOjE4MDQ1YjUzOTNlOjE4MDQ1YjYyMTNiOjFkYTU5ZGU=' ZmFrZS9iNjQ2NzRlYTM4NjdhMDlkOjE4MDQ1ODlhMDE0OjE4MDQ1YmFmNzcxOjFjYWIwMzQw 'ZmFrZS85NTNiNzAzMzJmMGUxNTc6MTgwNDU0Y2VjY2Y6MTgwNDViYjJiNzg6NGM3OTg0OTk='

piostelmach commented 2 years ago

and another question, how I can copy folders from local filesystem with index files ? In azure blob, index are in other format .gz :(

micolun commented 1 year ago

@slim-bean thank you for the tips. I'll check it and give some feedback :)

Do you know why on the local filesystem some of chunks has single quotation mark ' in the name ? For example:

'ZmFrZS9jODU1NTc2YzUxMGM1NDM6MTgwNDViMDJiNTQ6MTgwNDViMmU1N2U6YTZkMTAwOTg=' ZmFrZS8zODAwZGRiNDMwZmFmMDBhOjE4MDQ1YjI5ZjU3OjE4MDQ1YjI5ZjU4OmFkMjNkYjlk ZmFrZS9lYjM5OGQ0NjFmMjY0ZjE6MTgwNDViMmIxMzQ6MTgwNDViMmM1YTA6YTIyZTA0 ZmFrZS9lYjg3ZjkwNTU3Njg0Mjk1OjE4MDQ1YjI5Yjg1OjE4MDQ1YjI5Yjg2OmQ1N2FlZjUz ZmFrZS9jMzk5YmZkNDQwYjEwNjEzOjE4MDQ1YjI5YjZmOjE4MDQ1YjI5YjcwOmI3NjcwMWRl ZmFrZS9mMWMwYjk0MWI2N2I2NTcxOjE4MDQ1YjJiZjZhOjE4MDQ1YjJiZjZiOjNkMTZmMmE2 ZmFrZS8xOWI4MjQ0MWZmNWEzNWY4OjE4MDQ1YjMwNmFlOjE4MDQ1YjMwNmFmOmI4MDc3MTI4 ZmFrZS83ZjdlZmZjMzY1NDIyNTA5OjE4MDQ1YjFiYWQyOjE4MDQ1YjM3MTczOmVjYjFjN2U5 ZmFrZS9lYzQzN2JkOGRjMmZkMTA0OjE4MDQ1NGIzNDg1OjE4MDQ1YjkxMThmOmNlZmM3MWE0 ZmFrZS9iNmYzN2E4ODliOTVkZjI0OjE4MDQ1YjM2NWUwOjE4MDQ1YjRjZDI3OjVmOTJhMzEy ZmFrZS9iNjdiMzFlZmRiYTUwOTE5OjE4MDQ1YjQ1NjY4OjE4MDQ1YjUzNmQ3OjNkMGUyNWU3 'ZmFrZS85NGY2NmRmMmRmMjYxODJhOjE4MDQ1YjUzOTNlOjE4MDQ1YjYyMTNiOjFkYTU5ZGU=' ZmFrZS9iNjQ2NzRlYTM4NjdhMDlkOjE4MDQ1ODlhMDE0OjE4MDQ1YmFmNzcxOjFjYWIwMzQw 'ZmFrZS85NTNiNzAzMzJmMGUxNTc6MTgwNDU0Y2VjY2Y6MTgwNDViYjJiNzg6NGM3OTg0OTk='

Loki chunks file names are base64 encoded, so some of them will have one or 2 "=" characters in their name. ls command single quotes file names with some special characters (including "=" sign) so they can be safely copied or piped to other commands.

DavidConnack commented 8 months ago

I wanted to add my experience in case it will be useful others. I had 3 production sites that where initially using the file system and I needed to move them to AWS S3. The folder that needs to be dealt with is the chunks folder. As discussed above the file-system approach uses Base64 encoding. Using this python script we can convert the chunks to utf-8

#!/usr/bin/env python3
import os
import base64

def decode_and_rename(base_dir):
    """
    Recursively decodes Base64-encoded file names and renames the files.

    Args:
        base_dir (str): The starting directory.
    """

    for item in os.listdir(base_dir):
        full_path = os.path.join(base_dir, item)

        # If it's a directory, recurse into it
        if os.path.isdir(full_path):
            decode_and_rename(full_path)

        # If it's a file, check if the name is Base64-encoded
        elif os.path.isfile(full_path):
            try:
                # Attempt to decode the filename
                decoded_name = base64.b64decode(item).decode('utf-8')

                # Rename the file
                new_path = os.path.join(base_dir, decoded_name)
                os.rename(full_path, new_path)

                print(f"Renamed '{full_path}' to '{new_path}'")

            except base64.binascii.Error:
                # Not a valid Base64 name, skip
                print(f"Error renaming '{full_path}' to '{new_path}'")
                pass

base_directory = "dir"  # Replace with your actual directory
decode_and_rename(base_directory)

In order to prevent loss of production logs, I set the pv to retain, and first switched loki to use S3, that way new logs would be sent to S3. I then attached the EBS to a temp EC2 instance and ran the above script. It was then as simple as doing a S3 sync on the renamed folder to the S3 bucket and all the logs where available.

asikOnCB commented 5 months ago

@DavidConnack I do face the same scenario here. when i try to copy chunks(lot of tiny files) to s3 it take around 2 hours to complete 20% of files for ex 10460/56069 . which is not efficient. I used boto3 s3.copy_object to do that under for loop.

How long does it take when you used s3sync ? I have around four environment's to do this migration..(all four have lot of tiny files)

asikOnCB commented 5 months ago

@slim-bean @DavidConnack or anyone Can i use both s3 and file system at same time ? like storing future logs to s3 and retrieving old logs from filesystem which are currently there and eventually get rid of old logs using retention config on file system alone? in this way i dont want to migrate from ebs to s3 itself. I am trying to achieve that, It didnt stop me to run loki but am seeing files(chunks, index) on both the files system s3 and also /data/loki/chunks

arasool@arasool-MacBook-Pro cbc-tenant % aws --profile sso-profile s3 ls s3://asik-loki-test  --recursive | wc -l   
     528
/data/loki/chunks $ ls -ltra | wc -l
10112

my current loki.yaml looks like this.

auth_enabled: false
chunk_store_config:
  max_look_back_period: 0s
compactor:
  compaction_interval: 10m
  retention_delete_delay: 5m
  retention_delete_worker_count: 150
  retention_enabled: true
  shared_store: s3
  working_directory: /data/retention
ingester:
  chunk_block_size: 262144
  chunk_idle_period: 3m
  chunk_retain_period: 1m
  lifecycler:
    ring:
      replication_factor: 1
  max_transfer_retries: 0
  wal:
    dir: /data/loki/wal
limits_config:
  enforce_metric_name: false
  max_entries_limit_per_query: 5000
  reject_old_samples: true
  reject_old_samples_max_age: 168h
memberlist:
  join_members:
  - 'loki-memberlist'
schema_config:
  configs:
  - from: "2020-05-15"
    index:
      period: 24h
      prefix: index_
    object_store: aws
    schema: v11
    store: boltdb-shipper
server:
  grpc_listen_port: 9095
  http_listen_port: 3100
storage_config:
  aws:
    bucketnames: asik-loki-test
    s3: s3://us-east-1
  boltdb_shipper:
    active_index_directory: /data/loki/boltdb-shipper-active
    cache_location: /data/loki/boltdb-shipper-cache
    cache_ttl: 24h
    shared_store: s3
  filesystem:
    directory: /data/loki/chunks
table_manager:
  retention_deletes_enabled: false

What i am trying is it even possible technically ? or loki is not configured to work that way. if possible please help me on that by changing my config to adapt that Above configuration bring chunks on both file system.. are they replicating same files to both the files system..

I am using loki 2.9 Thanks and Regards Asik

micolun commented 5 months ago

I found that it's not practical to migrate Loki data to S3 bucket. It takes a lot of time, plus you have to make sure storages are synced (which can take a lot of time because of huge number of files). Inevitably, there is going to be a period when no logs will be captured. On the migration day you will have to stop old Loki instance, sync the storages, then start new one pointing to object storage. On top of that, we didn't have multi-tenancy enabled, and by default it gets fake as tenant_id, which is not very helpful. I couldn't find a way to modify existing tenant_id, I imagine this would also require an entire index rebuild.

We decided to go with another migration strategy and deploy 2 Loki instances:

  1. Keep the legacy Loki for reference to previous logs, on the original storage
  2. Deploy a new instance of Loki configured with object storage (S3 in our case) and apply multi-tenancy with a meaningful tenant_id value On the migration day, just switch Loki endpoint on Promtail and Grafana Data sources Once the logs on the legacy Loki expire (after 1 year in our case), it will be decommissioned.

We created a second Grafana data source and a separate set of Loki dashboards pointing to legacy Loki.

DavidConnack commented 5 months ago

I didnt find that to be true.

Before I migrated the existing logs I started sending the new logs to S3. Once I had that done, I used the python script to move the old logs to S3. The only caveat (really minor imho) is that for the time you are doing the migration you will not be able to query the old logs

DavidConnack commented 5 months ago

@DavidConnack I do face the same scenario here. when i try to copy chunks(lot of tiny files) to s3 it take around 2 hours to complete 20% of files for ex 10460/56069 . which is not efficient. I used boto3 s3.copy_object to do that under for loop.

How long does it take when you used s3sync ? I have around four environment's to do this migration..(all four have lot of tiny files)

It took a while for me too. I just left it to run and when it was done it was done. Our log volume wasnt massive as we only have a retention of 30 days

rootxrishabh commented 4 months ago

Hey folks, is it possible to migrate index and chunk files from schema: v9(boltdb with filesystem) to v13(tsdb with s3)?

mehransaeed7810 commented 2 months ago

I wanted to add my experience in case it will be useful others. I had 3 production sites that where initially using the file system and I needed to move them to AWS S3. The folder that needs to be dealt with is the chunks folder. As discussed above the file-system approach uses Base64 encoding. Using this python script we can convert the chunks to utf-8

#!/usr/bin/env python3
import os
import base64

def decode_and_rename(base_dir):
    """
    Recursively decodes Base64-encoded file names and renames the files.

    Args:
        base_dir (str): The starting directory.
    """

    for item in os.listdir(base_dir):
        full_path = os.path.join(base_dir, item)

        # If it's a directory, recurse into it
        if os.path.isdir(full_path):
            decode_and_rename(full_path)

        # If it's a file, check if the name is Base64-encoded
        elif os.path.isfile(full_path):
            try:
                # Attempt to decode the filename
                decoded_name = base64.b64decode(item).decode('utf-8')

                # Rename the file
                new_path = os.path.join(base_dir, decoded_name)
                os.rename(full_path, new_path)

                print(f"Renamed '{full_path}' to '{new_path}'")

            except base64.binascii.Error:
                # Not a valid Base64 name, skip
                print(f"Error renaming '{full_path}' to '{new_path}'")
                pass

base_directory = "dir"  # Replace with your actual directory
decode_and_rename(base_directory)

In order to prevent loss of production logs, I set the pv to retain, and first switched loki to use S3, that way new logs would be sent to S3. I then attached the EBS to a temp EC2 instance and ran the above script. It was then as simple as doing a S3 sync on the renamed folder to the S3 bucket and all the logs where available.

Hi @DavidConnack

Thanks for posting the python script. I am in similar situation where need to migrate loki from filesystem storage to liko in S3 (MinIO) in helm charts. I just ran the script to rename the files. Looks like its done. First I copied the data onto the server where new loki instance is running then renamed.

Just double checking next step is to copy the renamed files from /chunks/fake to the s3 environment. Folder structure in S3 looks bit different as in /chunks/default where default is the tenant-id. Also I am using minIO as storage which is using two PVC's and they are mounted in a volume where our kubernetes cluster is running.

My question is do we need to just copy data from /chunks/fake to the /chunks/default along with the index and tsdb-shipper-active directories.