medic / cht-sync

Data synchronization between CouchDB and PostgreSQL for the purpose of analytics.
GNU General Public License v3.0
2 stars 3 forks source link

feat(#112): drop support for multiple copies of every document #115

Open dianabarsan opened 1 week ago

dianabarsan commented 1 week ago

Description

Edits to docs produce updates in dbt models. Adds basic DBT models locally and use these for testing. Adds two options for deletes - TBD.

medic/cht-sync#112

Code review checklist

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

dianabarsan commented 1 week ago

@witash I've incorporated some of our previously discussed changes:

  1. I'm no longer prefixing the doc ids when inserting into databases, I'm instead spreading the docs between databases.
  2. I've added some local dbt models and made a slight update to the dbt reader to load them (I'm using tarball over http, seemed easiest).

I'm still not doing deletes, but thought It'd be good to get your thoughts about the above before I proceed.

Thanks!

lorerod commented 1 week ago

Also, I tried to run this branch locally while using the brac-clone-for-phil.dev.medicmobile.org CouchDB instance. I change the ./tests/.e2e-env to be:

POSTGRES_USER="postgres"
POSTGRES_PASSWORD="postgres"
POSTGRES_DB="data"
POSTGRES_TABLE="medic"
POSTGRES_SCHEMA="v1"
DBT_POSTGRES_USER="postgres"
DBT_POSTGRES_PASSWORD="postgres"
DBT_POSTGRES_SCHEMA="dbt"
DBT_POSTGRES_HOST="postgres"
DBT_PACKAGE_TARBALL_URL="http://dbt-package/dbt/package.tar.gz"
COUCHDB_USER="****"
COUCHDB_PASSWORD="****"
COUCHDB_DBS="medic"
COUCHDB_HOST="brac-clone-for-phil.dev.medicmobile.org"
COUCHDB_PORT=443
COUCHDB_SECURE=true
POSTGRES_HOST=postgres
COMPOSE_FILE=docker-compose.yml:docker-compose.postgres.yml:docker-compose.couchdb.yml:./tests/dbt/docker-compose.yml

With credentials from 1password couchdb: clone-brac-ug.dev And getting everything up withdocker compose --env-file ./tests/.e2e-env -f docker-compose.yml -f docker-compose.postgres.yml -f tests/dbt/docker-compose.yml up -d --build

couch2pg container is restarting with :

2024-06-20 11:48:24 Downloading CouchDB changes feed from 0
2024-06-20 11:49:28 Downloading CouchDB changes feed from 0
2024-06-20 11:50:31 Downloading CouchDB changes feed from 0
2024-06-20 11:51:34 Downloading CouchDB changes feed from 0
2024-06-20 11:51:37 
2024-06-20 11:51:37 node:internal/process/promises:289
2024-06-20 11:51:37             triggerUncaughtException(err, true /* fromPromise */);
2024-06-20 11:51:37             ^
2024-06-20 11:51:37 {
2024-06-20 11:51:37   code: 401,
2024-06-20 11:51:37   error: 'unauthorized',
2024-06-20 11:51:37   status: 401,
2024-06-20 11:51:37   name: 'unauthorized',
2024-06-20 11:51:37   message: undefined,
2024-06-20 11:51:37   stack: 'Error\n' +
2024-06-20 11:51:37     '    at Object.generateErrorFromResponse (/node_modules/pouchdb-errors/lib/index.js:100:18)\n' +
2024-06-20 11:51:37     '    at fetchJSON (/node_modules/pouchdb-adapter-http/lib/index.js:251:31)\n' +
2024-06-20 11:51:37     '    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' +
2024-06-20 11:51:37     '    at async fetchData (/node_modules/pouchdb-adapter-http/lib/index.js:1029:24)'
2024-06-20 11:51:37 }
2024-06-20 11:51:37 
2024-06-20 11:51:37 Node.js v20.14.0
dianabarsan commented 1 week ago

It looks like the username/password is wrong. That's what the 401 status code means.

dianabarsan commented 1 week ago

I've added two options for deletes, one that actually deletes data through a post hook and one that just adds the _deleted flag. I think the latter is easier to cascade and doesn't require the extra delete operation, but it does mean that every dashboard will need to filter out _deleted rows. I'm not entirely sure how to cascade the full delete, it would require a union with the main table and that seems pretty unclean. I'm going to add a "cascade" example for both.

lorerod commented 1 week ago

It looks like the username/password is wrong. That's what the 401 status code means.

I tried with same credentials curl --location 'https://****:****@brac-clone-for-phil.dev.medicmobile.org:443/medic' and it responded 200 ok I'm doubting myself now.

dianabarsan commented 1 week ago

Ok, I'll give this a try.

witash commented 1 week ago

It looks like the username/password is wrong. That's what the 401 status code means.

I tried with same credentials curl --location 'https://****:****@brac-clone-for-phil.dev.medicmobile.org:443/medic' and it responded 200 ok I'm doubting myself now.

I had the same issue but forgot to mention it. its something to do with session authentication; commenting out this line to force it to revert to http basic authentication works (and I guess is also why it works with curl)

dianabarsan commented 1 week ago

Wow, thanks for flagging this. This is because the session plugin doesn't work with CouchDb 2. I've created an issue to fix it, but in the meantime, would you mind testing this with a CouchDb 3 instance?

dianabarsan commented 1 week ago

I've bumped the version of the adapter. @lorerod could you please try again?

dianabarsan commented 1 week ago

Hi @njogz ! I've got this PR in the baking that does a bunch of things. The thing that is still unsuccessful is the handling of deletes - which my latest attempt to cascade to next models failed. i would appreciate if you could check it out and add your opinions - and offer solutions if you have a good one for the deletes.

Thanks!!

andrablaj commented 1 week ago

@njuguna-n tagging you as Diana referenced your older GitHub account in the comment above.

njuguna-n commented 1 week ago

Thanks @dianabarsan, I will look into the deletes tomorrow.

njuguna-n commented 1 week ago

A combination of the post hook to delete rows in some base models and a foreign key constraint with a DELETE CASCADE expression defined on models that reference them seemed like the best bet since we can keep the DB clean. I have not managed to get this to work yet but still debugging. I will explore additional "cascade" options tomorrow if this does not work.

lorerod commented 5 days ago

I've bumped the version of the adapter. @lorerod could you please try again?

This is working now using the brac-clone-for-phil.dev.medicmobile.org CouchDB instance and the .env mentioned here

In the couch2pg container log, I can see the majority of the transactions to be successful:

2024-06-27 14:42:45 Node.js v20.15.0
2024-06-27 14:34:50 https://admin:9498b55011604aba84f97fcc276114da@brac-clone-for-phil.dev.medicmobile.org:443/medic
2024-06-27 14:34:50 Downloading CouchDB changes feed from 0
2024-06-27 14:35:29 There are 1000 changes to process
2024-06-27 14:35:29 There are 808 deletions and 192 new / changed documents
2024-06-27 14:35:30 Pulled 192 results from couchdb

In between, I saw it restarting a couple of times, and at that moment, the logs showed:

> 2024-06-27 14:43:43 https://****:****@brac-clone-for-phil.dev.medicmobile.org:443/medic
> 2024-06-27 14:43:43 Downloading CouchDB changes feed from 11219-g1AAAAJ7eJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8____szKYkxgYOO1ygWLsZsmmaZaGSdj04DEpSQFIJtnDDWM_AjbM1DjNMinJglTDHECGxcMNY3sANizJLDXR0jKNVMMSQIbVww1jtQYbZpxkaGxhRKo381iAJEMDkAKaNx9iIMsesIEGhmYp5mnGZBm4AGLgfoiBzHcgEWFgaJpKpgsPQAy8DzWwEuLCFMtEA_Nksgx8ADEQGoZMSyGxa2RgbGZkik1rFgCJdqOF
> 2024-06-27 14:43:17   code: 'ENOTFOUND'
> 2024-06-27 14:43:17 }
> 2024-06-27 14:43:17 
> 2024-06-27 14:43:17 Node.js v20.15.0
> 2024-06-27 14:43:43 
> 2024-06-27 14:43:43 /node_modules/node-fetch/lib/index.js:1491
> 2024-06-27 14:43:43                     reject(new FetchError(`request to ${request.url} failed, reason: ${err.message}`, 'system', err));
> 2024-06-27 14:43:43                            ^
> 2024-06-27 14:43:43 FetchError: request to https://brac-clone-for-phil.dev.medicmobile.org/_session failed, reason: getaddrinfo ENOTFOUND brac-clone-for-phil.dev.medicmobile.org
> 2024-06-27 14:43:43     at ClientRequest.<anonymous> (/node_modules/node-fetch/lib/index.js:1491:11)
> 2024-06-27 14:43:43     at ClientRequest.emit (node:events:519:28)
> 2024-06-27 14:43:43     at TLSSocket.socketErrorListener (node:_http_client:500:9)
> 2024-06-27 14:43:43     at TLSSocket.emit (node:events:519:28)
> 2024-06-27 14:43:43     at emitErrorNT (node: internal/streams/destroy:169:8)
> 2024-06-27 14:43:43     at emitErrorCloseNT (node:internal/streams/destroy:128:3)
> 2024-06-27 14:43:43     at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
> 2024-06-27 14:43:43   type: 'system',
> 2024-06-27 14:43:43   errno: 'ENOTFOUND',
> 2024-06-27 14:43:43   code: 'ENOTFOUND'
> 2024-06-27 14:43:43 }

But it restarted successfully, and I didn't see the error anymore. I left it running for 4 hours, and at that point, the v1.medic table had 311000 rows.

dbt logs:

2024-06-27 18:21:10 22:21:10  Running with dbt=1.8.3
2024-06-27 18:21:11 22:21:11  Registered adapter: postgres=1.8.2
2024-06-27 18:21:12 22:21:12  Found 3 models, 3 data tests, 529 macros
2024-06-27 18:21:12 22:21:12  The selection criterion 'config.materialized:view' does not match any enabled nodes
2024-06-27 18:21:12 22:21:12  
2024-06-27 18:21:12 22:21:12  Concurrency: 1 threads (target='default')
2024-06-27 18:21:12 22:21:12  
2024-06-27 18:21:12 22:21:12  1 of 3 START sql incremental model dbt.contacts ................................ [RUN]
2024-06-27 18:21:12 22:21:12  1 of 3 OK created sql incremental model dbt.contacts ........................... [INSERT 0 341 in 0.50s]
2024-06-27 18:21:12 22:21:12  2 of 3 START sql incremental model dbt.reports ................................. [RUN]
2024-06-27 18:21:13 22:21:13  2 of 3 OK created sql incremental model dbt.reports ............................ [INSERT 0 306 in 0.26s]
2024-06-27 18:21:13 22:21:13  3 of 3 START sql incremental model dbt.persons ................................. [RUN]
2024-06-27 18:21:13 22:21:13  3 of 3 OK created sql incremental model dbt.persons ............................ [INSERT 0 289 in 0.31s]
2024-06-27 18:21:13 22:21:13  
2024-06-27 18:21:13 22:21:13  Finished running 3 incremental models in 0 hours 0 minutes and 1.35 seconds (1.35s).
2024-06-27 18:21:13 22:21:13  
2024-06-27 18:21:13 22:21:13  Completed successfully
2024-06-27 18:21:13 22:21:13  
2024-06-27 18:21:13 22:21:13  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
lorerod commented 4 days ago

@njuguna-n I tried this branch with a local couchdb version 3 instance, and I'm getting these errors in Couch2pg logs:

2024-06-28 13:57:00 Node.js v20.15.0
2024-06-28 13:56:57 http://medic:password@localhost:5984/medic
2024-06-28 13:56:57 Downloading CouchDB changes feed from 0
2024-06-28 13:56:58 http://medic:password@localhost:5984/medic
2024-06-28 13:56:58 Downloading CouchDB changes feed from 0
2024-06-28 13:57:00 http://medic:password@localhost:5984/medic
2024-06-28 13:57:00 Downloading CouchDB changes feed from 0
2024-06-28 13:57:04 http://medic:password@localhost:5984/medic
2024-06-28 13:57:04 Downloading CouchDB changes feed from 0
2024-06-28 13:57:04 
2024-06-28 13:57:04 /node_modules/node-fetch/lib/index.js:1491
2024-06-28 13:57:04                     reject(new FetchError(`request to ${request.url} failed, reason: ${err.message}`, 'system', err));
2024-06-28 13:57:04                            ^
2024-06-28 13:57:04 FetchError: request to http://localhost:5984/_session failed, reason: 
2024-06-28 13:57:04     at ClientRequest.<anonymous> (/node_modules/node-fetch/lib/index.js:1491:11)
2024-06-28 13:57:04     at ClientRequest.emit (node:events:519:28)
2024-06-28 13:57:04     at Socket.socketErrorListener (node:_http_client:500:9)
2024-06-28 13:57:04     at Socket.emit (node:events:519:28)
2024-06-28 13:57:04     at emitErrorNT (node:internal/streams/destroy:169:8)
2024-06-28 13:57:04     at emitErrorCloseNT (node:internal/streams/destroy:128:3)
2024-06-28 13:57:04     at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
2024-06-28 13:57:04   type: 'system',
2024-06-28 13:57:04   errno: 'ECONNREFUSED',
2024-06-28 13:57:04   code: 'ECONNREFUSED'
2024-06-28 13:57:04 }
2024-06-28 13:57:04 
2024-06-28 13:57:04 Node.js v20.15.0

To test this, I used CHT Docker Helper for 4.x installed version 4.4.0, which uses "COUCHDB_VERSION=3.3.2" and changed the yml file to expose port 5984. Maybe I'm getting something wrong with my local network. If there is an easiest way to try this with couchdb version 3 please let me know.

andrablaj commented 2 days ago

@lorerod have you tried running the CHT > 4.3 in development mode? The dev env setup worked successfully for me.

njuguna-n commented 2 days ago

@lorerod I suspect the issue might be that you have CouchDB running outside the Docker network that Couch2PG is in. Please try using host.docker.internal as the host instead of localhost (docs).

lorerod commented 1 day ago

@lorerod I suspect the issue might be that you have CouchDB running outside the Docker network that Couch2PG is in. Please try using host.docker.internal as the host instead of localhost (docs).

Thank you, @njuguna-n. Now it is working! With this, I can confirm it works with Couchdb version 3.3.2