Only a single XTDB directory per platform?

dwolfson commented 1 year ago

As I was investigating some locking errors that I received on startup of the lab jupyter notebooks, I noticed that there seems to be only one XTDB directory per platform. In the lab environment there are multiple metadata servers per platform. I can see in the console log that the first server comes up ok - but that the subsequent servers on the platform get a locking error from XTDB...I suspect that we need change this so that there is a separate XTDB directory per server per platform?

dwolfson commented 1 year ago

I guess what it really means is that we have to create a separate xtdb config per server rather than try to share an xtdb config per platform - so the labs need to change.

mandy-chessell commented 1 year ago

The labs have multiple metadata servers starting up on the same platform.

I had a quick look at the code and for an EDN config, the startup creates a temporary file called ./xtdb. It then stores the EDN config into that file and the file (actually a directory) is passed to XTDB.

This is the source file ... https://github.com/odpi/egeria-connector-xtdb/blob/main/connector/src/main/java/org/odpi/egeria/connectors/juxt/xtdb/repositoryconnector/XtdbOMRSRepositoryConnector.java

This is the code that builds the config file...

if (configProperties.containsKey(XtdbOMRSRepositoryConnectorProvider.XTDB_CONFIG_EDN)) {
                // EDN-style configuration
                try {
                    configFile = File.createTempFile("xtdb", ".edn", new File("./"));
                    String xtdbCfg = (String) configProperties.get(XtdbOMRSRepositoryConnectorProvider.XTDB_CONFIG_EDN);
                    luceneConfigured = xtdbCfg.contains(Constants.XTDB_LUCENE);
                    BufferedWriter writer = new BufferedWriter(new FileWriter(configFile));
                    writer.write(xtdbCfg);
                    writer.close();
                } catch (IOException e) {
                    auditLog.logException(methodName, XtdbOMRSAuditCode.CANNOT_READ_CONFIGURATION.getMessageDefinition(e.getClass().getName()), e);
                    throw new ConnectorCheckedException(XtdbOMRSErrorCode.CANNOT_READ_CONFIGURATION.getMessageDefinition(repositoryName),
                            this.getClass().getName(), methodName, e);
                }
            }

I wonder if we change this line from:

configFile = File.createTempFile("xtdb", ".edn", new File("./"));

to

configFile = File.createTempFile("xtdb", ".edn", new File("./" + serverName + "/"));

then the config files for each server will be segregated into their own directory.

cmgrote commented 1 year ago

the startup creates a temporary file called ./xtdb. It then stores the EDN config into that file and the file (actually a directory) is passed to XTDB.

File.createTempFile() creates a random file using the input parameters as a prefix and suffix — it's not a directory. According to the docs, it is also guaranteed that "Neither this method nor any of its variants will return the same abstract pathname again in the current invocation of the virtual machine." (See: https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/io/File.html#createTempFile(java.lang.String,java.lang.String,java.io.File))

I'm interpreting that to mean that the only way to get a conflict would be to have multiple JVMs that happen to create the same random temporary filename (though my guess would be that even there the likelihood of this is probably relatively small?)

Nonetheless, perhaps it makes sense to use the server name as the prefix, rather than the literal xtdb.

mandy-chessell commented 1 year ago

My bad - "random" should be "temporary"

The problem is that we have two servers, each requiring their own repository, running in the same JVM. They try to use the same config and they intefere with one another.

dwolfson commented 1 year ago

Would it also then be true that each metadata server needs its own Postgres database?

refset commented 1 year ago

@dwolfson in case it helps: the XTDB JDBC module implementation currently assumes a single hard-coded table name called tx_events, so for separate XTDB instances (i.e. different tx-log & doc-store) you would at least have to use separate schemas (e.g. as per this thread), if not separate logical databases or physical servers.

dwolfson commented 1 year ago

@refset thanks for the suggestion. Need to think about what might make the most sense once we have a firm notion of how many metadata servers there would be. It probably does make sense to use different Postgres schemas for different metadata servers on the same Egeria platform..but need to think about it.

planetf1 commented 1 year ago

It appears to me that each egeria repository server should have some level of isolation in terms of it's repo - ie it should have it's 'own' xtdb repository (mostly think tx log & doc store). That could be unique postgres deployments, or schemas.

So this mostly seems to relate to configuration?

Where we have multiple replicas of the same server (ie 3 replicas of cocoMDS2 for load sharing/redundancy) then they should of course SHARE the repositories

On the temp file - it seems as if the code should be fine, though I'd be inclined to still put it in a server specific directory.. but actually there's more..... in a k8s environment we are trying to minimize persistence storage. So I do also assume that this file is only a temporary file - doesn't matter if it goes away when server is restarted

dwolfson commented 1 year ago

@planetf1 with regards to an HA configuration, if you have multiple replicas of the same server for HA reasons then you most likely want to attach each server to a different database to enable database redundancy as well?

planetf1 commented 1 year ago

Each egeria server replica (ie instances of cocoMDS2 etc) has a common configuration, and will use the same back end persistent stores, indeed this is required as all should behave the same

One way of achieving this is simply to have a common 'service' for the postgres backend - and let it's implementation worry about failover/scaling -- for example with CrunchData's postgres operator it will automatically use the current master.

cmgrote commented 1 year ago

@dwolfson can you please re-test with the latest snapshot, once #474 is merged? This changes the temporary EDN-based config file to be server-specific.

I primarily want to be certain it doesn't break anything (I don't have a local EDN-based config and broader setup to quickly test against); but if you can also confirm whether it fixes the multi-server config conflict even better 😁

dwolfson commented 1 year ago

Thanks @cmgrote - I'll try to test tomorrow..

dwolfson commented 1 year ago

@cmgrote - I configured the labs with the in-memory xtdb and there were no errors (tested against 4 notebooks). I looked in the pods to see where the XTDB files are - they don't seem to be in /deployments/data/servers/.. anymore? Where did they go?

I tried using Postgres for the xtdb library - but it gives a Lucene error - I think the same one that @planetf1 has been chasing..

planetf1 commented 1 year ago

@dwolfson You might want to retry the postgres/xtdb today. CTS test was successful last night.

odpi / egeria-connector-xtdb

Only a single XTDB directory per platform? #471