zodb / relstorage

A backend for ZODB that stores pickles in a relational database.
Other
54 stars 46 forks source link

zodbconvert with relstorage 3.0.1 #392

Closed agitator closed 4 years ago

agitator commented 4 years ago

I updated to relstorage 3.0.1 by accident (missed to pin the version) but so far no issue.

When I tried to convert from relstorage to pull a copy of the data for local development, I ran into this error about the blob layout:

plone@5f7ee459e047:/plone/buildout$ ./bin/zodbconvert var/instance/import/backup/backup-phonogen.cfg 
Traceback (most recent call last):
  File "./bin/zodbconvert", line 315, in <module>
    sys.exit(relstorage.zodbconvert.main())
  File "/home/plone/.buildout/shared-eggs/RelStorage-3.0.1-py2.7-linux-x86_64.egg/relstorage/zodbconvert.py", line 112, in main
    source, destination = open_storages(options)
  File "/home/plone/.buildout/shared-eggs/RelStorage-3.0.1-py2.7-linux-x86_64.egg/relstorage/zodbconvert.py", line 77, in open_storages
    source = config.source.open()
  File "/home/plone/.buildout/shared-eggs/RelStorage-3.0.1-py2.7-linux-x86_64.egg/relstorage/config.py", line 43, in open
    return RelStorage(adapter, name=config.name, options=options)
  File "/home/plone/.buildout/shared-eggs/RelStorage-3.0.1-py2.7-linux-x86_64.egg/relstorage/storage/__init__.py", line 219, in __init__
    self.blobhelper = BlobHelper(options=options, adapter=adapter)
  File "/home/plone/.buildout/shared-eggs/RelStorage-3.0.1-py2.7-linux-x86_64.egg/relstorage/blobhelper/__init__.py", line 97, in BlobHelper
    return CacheBlobHelper(options, adapter)
  File "/home/plone/.buildout/shared-eggs/RelStorage-3.0.1-py2.7-linux-x86_64.egg/relstorage/blobhelper/cached.py", line 291, in __init__
    fshelper.create()
  File "/home/plone/.buildout/shared-eggs/ZODB-5.5.1-py2.7.egg/ZODB/blob.py", line 399, in create
    (self.layout_name, self.base_dir, layout))
ValueError: Directory layout `zeocache` selected for blob directory /blobstorage/, but marker found for layout `bushy`

Do I have to set the layout in the target directory manually? What is recommended to work around this problem?

jamadden commented 4 years ago

First…

Do I have to set the layout in the target directory manually?

No, never do that, you may corrupt or lose data.

What is recommended to work around this problem?

I don't understand what the problem is. From what are you trying to convert to what? Can you share your configuration?

Just as a wild guess, whatever storage you're trying to open here with RelStorage appears to be pointed to a pre-existing, pre-populated blob directory. The default for RelStorage is to store blobs in the server, and only cache blobs on disk on each client ("zeocache"). But the pre-existing blob directory configured here is not a cache ("bushy"). If it's a pre-existing blob dir for the other storage, and you're trying to share the on-disk blob storage, you need to explicitly configure RelStorage not to treat its blob directory as a cache; that should let it use the existing on-disk layout. (Note that this is not generally recommended for production as it disables parallel commit.)

agitator commented 4 years ago

This is my config

<relstorage source>
        # ZODB Cache Dir
        blob-dir /blobstorage
        # db connect
        <postgresql>
        dsn dbname='***' user='***' host='***' password='***'
        </postgresql>
</relstorage>

<filestorage destination>
        #zeo data
        path /transfer/backup/Data.fs
        #blobs
        blob-dir /transfer/backup/blobstorage
</filestorage>

The target directory /transfer/backup/ is empty, besides the backup.cfg

If I understand it correctly, the shared-blob-dir option is not recommended/working anymore with relstorage 3.x and blobs are store in the db (postgres) itself?

<zodb_db main>
    # Main database
    cache-size 30000
    %import relstorage
    <relstorage>
        commit-lock-timeout 600
        keep-history true
        blob-dir /blobstorage
        cache-servers memcached:11211
        blob-cache-size 512mb
        shared-blob-dir true
        cache-local-mb 1
        poll-interval 10
        cache-prefix plone
        <postgresql>
            dsn dbname='***' host='***' user='***' password='***'
        </postgresql>
    </relstorage>
    mount-point /
</zodb_db>

So "shared-blob-dir true" should be removed. What about the blob-dir itself? Is there some kind of migration to move existing blobs into the db instead of the filesystem?

jamadden commented 4 years ago

So your source is configured to use a blob cache (the default). But the /blobstorage directory on disk already exists and is not a blob cache. You need to explicitly set shared-blob-dir to true for your source to be able to read that directory.

If I understand it correctly, the shared-blob-dir option is not recommended/working anymore with relstorage 3.x and blobs are store in the db (postgres) itself?

It works just as well as it always has — which may or may not be any good, is highly workload dependent, and can't work with parallel commit (plus it's easy to lose data). Only the default value for shared-blob-dir has changed.

So "shared-blob-dir true" should be removed.

Not if you want to keep using that blob-dir. If you remove it and do nothing else, you'll get this exact same error. Keep using shared-blob-dir true if it works well for your use case and you don't care about parallel commits…for example, this might be the case if you have extremely fast shared storage (or only one RelStorage node so the "shared" storage is in fact local) and your workload is mostly read-only (so parallel commit isn't important), and you have good backup policies guaranteed to keep your RDBMS data and on-disk data in sync.

What about the blob-dir itself? Is there some kind of migration to move existing blobs into the db instead of the filesystem?

Yes, zodbconvert. Use a source with your existing true value for shared-blob-dir and a destination with it false. Again, this isn't necessary if you're totally satisfied with the current status quo.

Other (unsolicited 😄 )configuration comments:

    commit-lock-timeout 600

That seems…very high. Counter intuitively, high commit lock timeouts can actually make things worse if you start waiting on them. Unless your workload consists of writing to the same objects and performing expensive conflict resolution, I suggest using a much lower value, especially under RelStorage 3. Maybe go back to the default.

        cache-servers memcached:11211

Memcache support is deprecated and highly likely to be removed in the future. It used to be suggested if the database server had ridiculously high latency, but I haven't seen a scenario where it has any real benefit anymore, given a properly sized RelStorage cache and a RelStorage persistent cache. (If you can provide measurements and details of such a scenario I'd be interested in hearing them.)

        cache-local-mb 1

That seems quite small. If your workload is effectively write-only, you can set it to 0 to disable the cache and remove some overhead (but if that's the case, you can set the main cache-size to 0 as well for the same benefit). If you only ever use one thread/ZODB Connection per process, or the ZODB Connections you use work with completely non-overlapping sets of objects, then a size of 0 also makes sense, together with a larger value for cache-size. But for most applications that use some shared, read/write objects like catalogs from multiple threads/Connections, a larger RelStorage cache can be quite beneficial.

        poll-interval 10

poll-interval was deprecated and made to do nothing in RelStorage 2; I think it was removed entirely in RelStorage 3.

agitator commented 4 years ago

Got a better picture now :-) Thanx for all your clarifications and recommendations, highly appreciated! Merry Xmas!