ohdsi-studies / PhenoPhebStemiCharacterization

This comprehensive study aims to understand STEMI patients’ characteristics and identify the incidence rates across multiple real-world data datasets.
0 stars 1 forks source link

Problems with running the study #1

Closed filipmaljkovic closed 4 months ago

filipmaljkovic commented 4 months ago

After some fiddling about (also, for other people reading this, it should be noted that there's a hard lower limit for the version of R, since Strategus requires R>=4.2.0, and also for GCC, when talking about Linux, since "igraph" package cannot be installed without GCC 10+), I got the Strategus code running. So, after creating initial tables, one of the first queries starts executing and never seems to end.

Here's my R console log:

> Strategus::execute(analysisSpecifications = analysisSpecifications,
+                    executionSettings = executionSettings,
+                    executionScriptFolder = file.path(outputFolder, "script_folder"))
Verifying module: CohortGeneratorModule, (0.3.0) at ~/StrategusModulesHome/CohortGeneratorModule_0.3.0...MODULE READY!
Verifying module: CohortDiagnosticsModule, (0.2.0) at ~/StrategusModulesHome/CohortDiagnosticsModule_0.2.0...MODULE READY!
Connecting using PostgreSQL driver
▶ dispatched target analysisSpecifications
● completed target analysisSpecifications [0.029 seconds]
▶ dispatched target keyringSettings
● completed target keyringSettings [0.001 seconds]
▶ dispatched target executionSettings
● completed target executionSettings [0 seconds]
▶ dispatched target CohortGeneratorModule_1
MODULE READY!ule: CohortGeneratorModule, (0.3.0) at ~/StrategusModulesHome/CohortGeneratorModule_0.3.0...
Loading required package: DatabaseConnector
Loading required package: R6
Validating inputs
Executing
Connecting using PostgreSQL driver
Table "STEMI_v2" already exists and in incremental mode, so not recreating it.
Table "STEMI_v2" already exists and in incremental mode, so not recreating it.
Table "STEMI_v2_inclusion" already exists and in incremental mode, so not recreating it.
Table "STEMI_v2_inclusion_result" already exists and in incremental mode, so not recreating it.
Table "STEMI_v2_inclusion_stats" already exists and in incremental mode, so not recreating it.
Table "STEMI_v2_summary_stats" already exists and in incremental mode, so not recreating it.
Table "STEMI_v2_censor_stats" already exists and in incremental mode, so not recreating it.
1/1- Generating cohort: [OHDSI PhenotypePhebruary] STEMI - def.4 diagSTEMI (with inclusion and exclusion criteria) v1
/ |===                                                                   |   5%

I managed to catch the query (and re-format it), which is as follows:

SELECT
event_id, person_id, start_date, end_date, op_start_date, op_end_date, visit_occurrence_id
FROM (
    select pe.event_id, pe.person_id, pe.start_date, pe.end_date, pe.op_start_date, pe.op_end_date, row_number() over (partition by pe.person_id order by pe.start_date ASC) as ordinal, cast(pe.visit_occurrence_id as bigint) as visit_occurrence_id
    FROM (
        select P.ordinal as event_id, P.person_id, P.start_date, P.end_date, op_start_date, op_end_date, cast(P.visit_occurrence_id as bigint) as visit_occurrence_id
        FROM (
            select E.person_id, E.start_date, E.end_date, row_number() OVER (PARTITION BY E.person_id ORDER BY E.sort_date ASC, E.event_id) ordinal, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date, cast(E.visit_occurrence_id as bigint) as visit_occurrence_id
            FROM (
                SELECT C.person_id, C.condition_occurrence_id as event_id, C.start_date, C.end_date, C.visit_occurrence_id, C.start_date as sort_date
                FROM  (
                    SELECT co.person_id,co.condition_occurrence_id,co.condition_concept_id,co.visit_occurrence_id,co.condition_start_date as start_date, COALESCE(co.condition_end_date, (co.condition_start_date + 1*INTERVAL'1 day')) as end_date, row_number() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal+
                    FROM cdm.CONDITION_OCCURRENCE co
                    JOIN Codesets cs on (co.condition_concept_id = cs.concept_id and cs.codeset_id = 49)
                ) C
                WHERE C.start_date >= TO_DATE(TO_CHAR(2016,'0000')||'-'||TO_CHAR(1,'00')||'-'||TO_CHAR(1,'00'), 'YYYY-MM-DD')
                AND C.ordinal = 1
            ) E
            JOIN cdm.observation_period OP on E.person_id = OP.person_id and E.start_date >=  OP.observation_period_start_date and E.start_date <= op.observation_period_end_date
            WHERE (OP.OBSERVATION_PERIOD_START_DATE + 365*INTERVAL'1 day') <= E.START_DATE
            AND (E.START_DATE + 0*INTERVAL'1 day') <= OP.OBSERVATION_PERIOD_END_DATE
        ) P
        WHERE P.ordinal = 1
    ) pe
    JOIN (
        select 0 as index_id, person_id, event_id
        FROM (
            select E.person_id, E.event_id 
            FROM (
                select P.ordinal as event_id, P.person_id, P.start_date, P.end_date, op_start_date, op_end_date, cast(P.visit_occurrence_id as bigint) as visit_occurrence_id
                FROM
                (
                    select E.person_id, E.start_date, E.end_date, row_number() OVER (PARTITION BY E.person_id ORDER BY E.sort_date ASC, E.event_id) ordinal, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date, cast(E.visit_occurrence_id as bigint) as visit_occurrence_id
                    FROM (
                        SELECT C.person_id, C.condition_occurrence_id as event_id, C.start_date, C.end_date, C.visit_occurrence_id, C.start_date as sort_date
                        FROM (
                            SELECT co.person_id,co.condition_occurrence_id,co.condition_concept_id,co.visit_occurrence_id,co.condition_start_date as start_date, COALESCE(co.condition_end_date, (co.condition_start_date + 1*INTERVAL'1 day')) as end_date , row_number() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal+
                            FROM cdm.CONDITION_OCCURRENCE co
                            JOIN Codesets cs on (co.condition_concept_id = cs.concept_id and cs.codeset_id = 49)
                        ) C
                        WHERE C.start_date >= TO_DATE(TO_CHAR(2016,'0000')||'-'||TO_CHAR(1,'00')||'-'||TO_CHAR(1,'00'), 'YYYY-MM-DD')
                        AND C.ordinal = 1
                    ) E
                    JOIN cdm.observation_period OP on E.person_id = OP.person_id and E.start_date >=  OP.observation_period_start_date and E.start_date <= op.observation_period_end_date
                    WHERE (OP.OBSERVATION_PERIOD_START_DATE + 365*INTERVAL'1 day') <= E.START_DATE
                    AND (E.START_DATE + 0*INTERVAL'1 day') <= OP.OBSERVATION_PERIOD_END_DATE
                ) P
                WHERE P.ordinal = 1
            ) E
            INNER JOIN (
                select 0 as index_id, p.person_id, p.event_id
                from (
                    select P.ordinal as event_id, P.person_id, P.start_date, P.end_date, op_start_date, op_end_date, cast(P.visit_occurrence_id as bigint) as visit_occurrence_id
                    FROM (
                        select E.person_id, E.start_date, E.end_date, row_number() OVER (PARTITION BY E.person_id ORDER BY E.sort_date ASC, E.event_id) ordinal, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date, cast(E.visit_occurrence_id as bigint) as visit_occurrence_id
                        FROM (
                            SELECT C.person_id, C.condition_occurrence_id as event_id, C.start_date, C.end_date, C.visit_occurrence_id, C.start_date as sort_date
                            FROM (
                                SELECT co.person_id,co.condition_occurrence_id,co.condition_concept_id,co.visit_occurrence_id,co.condition_start_date as start_date, COALESCE(co.condition_end_date, (co.condition_start_date + 1*INTERVAL'1 day')) as end_date , row_number() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal+
                                FROM cdm.CONDITION_OCCURRENCE co
                                JOIN Codesets cs on (co.condition_concept_id = cs.concept_id and cs.codeset_id = 49)
                            ) C
                            WHERE C.start_date >= TO_DATE(TO_CHAR(2016,'0000')||'-'||TO_CHAR(1,'00')||'-'||TO_CHAR(1,'00'), 'YYYY-MM-DD')
                            AND C.ordinal = 1
                        ) E
                        JOIN cdm.observation_period OP on E.person_id = OP.person_id and E.start_date >=  OP.observation_period_start_date and E.start_date <= op.observation_period_end_date
                        WHERE (OP.OBSERVATION_PERIOD_START_DATE + 365*INTERVAL'1 day') <= E.START_DATE
                        AND (E.START_DATE + 0*INTERVAL'1 day') <= OP.OBSERVATION_PERIOD_END_DATE
                    ) P
                    WHERE P.ordinal = 1
                ) p
                LEFT JOIN (
                    SELECT p.person_id, p.event_id 
                    FROM (
                        select P.ordinal as event_id, P.person_id, P.start_date, P.end_date, op_start_date, op_end_date, cast(P.visit_occurrence_id as bigint) as visit_occurrence_id
                        FROM (
                            select E.person_id, E.start_date, E.end_date, row_number() OVER (PARTITION BY E.person_id ORDER BY E.sort_date ASC, E.event_id) ordinal, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date, cast(E.visit_occurrence_id as bigint) as visit_occurrence_id
                            FROM (
                                SELECT C.person_id, C.condition_occurrence_id as event_id, C.start_date, C.end_date, C.visit_occurrence_id, C.start_date as sort_date
                                FROM (
                                    SELECT co.person_id,co.condition_occurrence_id,co.condition_concept_id,co.visit_occurrence_id,co.condition_start_date as start_date, COALESCE(co.condition_end_date, (co.condition_start_date + 1*INTERVAL'1 day')) as end_date , row_number() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal+
                                    FROM cdm.CONDITION_OCCURRENCE co
                                    JOIN Codesets cs on (co.condition_concept_id = cs.concept_id and cs.codeset_id = 49)
                                ) C
                                WHERE C.start_date >= TO_DATE(TO_CHAR(2016,'0000')||'-'||TO_CHAR(1,'00')||'-'||TO_CHAR(1,'00'), 'YYYY-MM-DD')
                                AND C.ordinal = 1
                            ) E
                            JOIN cdm.observation_period OP on E.person_id = OP.person_id and E.start_date >=  OP.observation_period_start_date and E.start_date <= op.observation_period_end_date
                            WHERE (OP.OBSERVATION_PERIOD_START_DATE + 365*INTERVAL'1 day') <= E.START_DATE
                            AND (E.START_DATE + 0*INTERVAL'1 day') <= OP.OBSERVATION_PERIOD_END_DATE
                        ) P
                        WHERE P.ordinal = 1
                    ) P
                    JOIN (
                        SELECT C.person_id, C.condition_occurrence_id as event_id, C.start_date, C.end_date, C.visit_occurrence_id, C.start_date as sort_date
                        FROM (
                            SELECT co.person_id,co.condition_occurrence_id,co.condition_concept_id,co.visit_occurrence_id,co.condition_start_date as start_date, COALESCE(co.condition_end_date, (co.condition_start_date + 1*INTERVAL'1 day')) as end_date 
                            FROM cdm.CONDITION_OCCURRENCE co
                            JOIN Codesets cs on (co.condition_concept_id = cs.concept_id and cs.codeset_id = 50)
                        ) C
                    ) A on A.person_id = P.person_id  AND A.START_DATE >= P.OP_START_DATE AND A.START_DATE <= P.OP_END_DATE AND A.START_DATE >= (P.START_DATE + 0*INTERVAL'1 day') AND A.START_DATE <= (P.START_DATE + 0*INTERVAL'1 day')
                ) cc on p.person_id = cc.person_id and p.event_id = cc.event_id
                GROUP BY p.person_id, p.event_id
                HAVING COUNT(cc.event_id) = 0
            ) CQ on E.person_id = CQ.person_id and E.event_id = CQ.event_id
            GROUP BY E.person_id, E.event_id
            HAVING COUNT(index_id) = 1
        ) G
    ) AC on AC.person_id = pe.person_id and AC.event_id = pe.event_id
) QE
WHERE QE.ordinal = 1;

However, since Codesets seems to be a temp table, I can't really run this query or analyze it properly.

After running for 17 hours, I killed the query, and then added some indexes (start and end dates for observation_period and condition_occurrence weren't there, so I added them, since those columns are used heavily in the query) and reran the query and now it's been running for more than 4 hours.

Is there anything (else) that can be done to make this query finish in finite time?

Note that the server I'm running this on is pretty idle, and has 18 CPU cores, and only one is being used to executing the query (I hear Postgres v9.6 onward has support for multiple cores / parallel execution, and I'm using Postgres 10).

szhorvat commented 4 months ago

since "igraph" package cannot be installed without GCC 10+

This does not sound right. The igraph C library should work with GCC 4.8. The igraph R package might have accidentally acquired a requirements for a somewhat more recent GCC, but GCC 10 does not sound plausible. If you can't compile igraph with an old GCC that is supported with your R version, I suggest you open an issue.

filipmaljkovic commented 4 months ago

Yeah, I agree igraph shouldn't require higher GCC version, but with version 4.8 I got the following error: vendor/cigraph/src/community/infomap/infomap.cc:47:12: error: ‘isnan’ is already declared in this scope and all the searches on the internet pointed me toward upgrading GCC to 10+, which actually did the trick.

So, whereas igraph doesn't officially require GCC 10+, for me it effectively did require it and the upgrade worked.

But I solved that on my own. The query thing I cannot solve, since it's a bit lower down the stack. However, @milou-brand said on the forum that the cohorts and code are being updated, so I shouldn't run the current version.

milou-brand commented 4 months ago

Hi @filipmaljkovic

I had a look into the CohortGeneratorModule and it looks like there have been some updates to the package. One of which might be similar to your problem, see here: https://github.com/OHDSI/CohortGeneratorModule/blob/main/NEWS.md#cohortgeneratormodule-041

I will make sure to integrate these into the design when I push it later today, let me know if it solves your problem.

filipmaljkovic commented 4 months ago

Hi. I downloaded the new code, and indeed, a new folder gets created (CohortGeneratorModule_0.4.2). However, the same query gets stuck at the same place. 5% and it won't budge. I killed it after 5.5h hours of execution, but I reran it again now, just to be sure.

milou-brand commented 4 months ago

@filipmaljkovic thanks for sharing.

Do you have a writeable schema that you can use to store those temporary tables that are created when running the SQL? Have you tried extracting the SQL from the inst folder and see if you can run that in your local postgressql environment? Or extract the Json file from the inst folder and run it on your local ATLAS instance if you have one?

I'm trying to figure out where it's hanging itself as it doesn't look like the update did the trick.

filipmaljkovic commented 4 months ago

Yes, I created temp.Codesets and started to analyze the query parts. I attached the analysis, as I did it so far. Most of the simpler subqueries are fast, but the slowness starts to show once we do group by ... having. The last subquery returned results in 55s (and that's just the inner join-ed query).

PhenoPhebStemiCharacterization-sql_analysis.txt

Unfortunately, I don't see any ways to make this faster. Any suggestions?

milou-brand commented 4 months ago

@filipmaljkovic I have received the same feedback from another data partner. It seems that it is a postgressql issue, I will check with the Strategus development team and see if I can find out more.

anthonysena commented 4 months ago

Hi @milou-brand and team - was looking into this study design a bit and saw that the cohort definition is using an initial restriction that may be causing some of the performance issues with this cohort. For reference:

image

The "restrict initial events to" requires exactly 0 occurrences of a non-stemi condition which is causing the LEFT JOIN criteria in the cohort definition query. I suspect this is where you are facing the performance issues. Instead, you might consider revising this cohort definition a bit. You might consider moving the exactly 0 occurrences of a non-stemi condition criterion to the inclusion rules section instead. This change will then perform a LEFT JOIN on the qualified events (STEMI condition, 1st time in history on or after 1 Jan 2016) which will be significantly smaller than the entire condition_occurrence table that is part of the nested subqueries in the original definition.

Tagging @chrisknoll to double check my claims.

milou-brand commented 4 months ago

@filipmaljkovic I have pushed the update as suggested by @anthonysena above to Git. Let me know if this makes the query run more smoothly or not.

chrisknoll commented 4 months ago

From an index perspective, the index that includes dates is helpful (in that it can range search on dates) but the primary column in the index should be on person_id, since we'll be joining across tables on person_id. Additionally, we will search by concept_id, so a secondary index on {domain}_concept_id also makes sense. In this index I don't think including dates matter because usually when we're searching by concept_id, it doesn't really matter when, it's just 'who has condition/drug/procedure X/Y/Z'. Once we know who they are, we will search for additional information about the person via join on person_id.

I'm not an expert on postgres topology and indexes but being able to index on people (to find the people in question) and then support a range scan on the rows via date is the 2 things that should be leveraged to make these queries faster.

Secondly, the default options that Postgres usually is configured with is for smaller scale queries and fast lookups. It means that it allocates less ram to support more connections to the data (like a transactional system). But when dealing with patient records, it could be a much larger set of data to seek and therefore may need more memory allocation to perform joins (in memory vs. swap to disk). I don't have these parameters that you should look into, but you should do some research on how your environment is configured to ensure you are configured to do the correct workload.

filipmaljkovic commented 4 months ago

Hi everyone, thanks for trying to find ways to help out.

This update was actually fruitful: the first query finished almost immediately and the progress bar moved from 5% to 13% instantaneously, where it settled for a minute and then to 16%.

I'm letting it do its work and will update if there are any more problems, or, hopefully, if there's success. :)

filipmaljkovic commented 4 months ago

We have success:

● completed target CohortDiagnosticsModule_2 [13.88 minutes]
▶ ended pipeline [18.464 minutes]
filipmaljkovic commented 4 months ago

OK, now I have a different problem, unrelated to DB optimizations, but I thought I might report it here rather than open a new issue.

Namely, in the Results_compile.R, I get an error on line 31:

> RSQLite::dbConnect(RSQLite::SQLite(), file.path(outputLocation, "app" ,paste0(connectionDetailsReference, ".sqlite")))
Error: Could not connect to database:
unable to open database file

The path it's trying to find the sqlite db file is .../Strategus/outputCHCZ/app/CHCZ.sqlite, but to no surprise, it doesn't exist. This is the folder outputCHCZ folder after running Strategus' execute:

image

So, how do I get the app folder?

milou-brand commented 4 months ago

You can create the app folder yourself or you can paste this script to create the folder automatically.

subDir <- "app"
if (file.exists(paste0(outputFolder,subDir))){
    print("no action taken")
} else {
  print("created app folder in outputFolder")
  dir.create(file.path(outputFolder, subDir))
}
chrisknoll commented 4 months ago

I'd like to know anything you did to improve the DB performance. Performance is something that is under-documented and any information that improves quality-of-life when running this would be greatly appreciated.

filipmaljkovic commented 4 months ago

@milou-brand thanks! I thought that the sqlite database should have been created during the "CodeToRun.R" execution -- if I knew I only had to create an empty folder, I would have done it simply :)

That said, I guess that it would be nice for that piece of code to be in "Results_compile.R", since it makes sense to execute it automatically.

However, execution of Results_compile.R does work, but up to a point. I get the following error:

Uploading file: cd_concept_sets.csv to table: cd_concept_sets
- Preparing to upload rows 1 through 5
An error report has been created at  .../Strategus/outputCHCZ/upload-errorReport.R
An error report has been created at  .../Strategus/outputCHCZ/upload-errorReport.R
Error in formatChunk(pkValuesInDb = env$primaryKeyValuesInDb, chunk = chunk) :
  concept_set_id is of type numeric which cannot be converted between data frames pkValuesInDb and chunk

Here's the upload-errorReport.R

@chrisknoll I didn't do any further optimization other than the ones mentioned in the OP (way before this change in code): start and end dates for observation_period and condition_occurrence (I believe they aren't in there by default when you run the code to initialize the empty CDM DB).

milou-brand commented 4 months ago

@filipmaljkovic It looks like there is a discrepancy between how your tables have been set up and what CohortDiagnostics is expecting as input.

If I dive into the cohortDiagnostics package installed locally for me and then sql>sql_server>createResultsDataModel.sql It is expecting concept_set_id to be an INT rather than numeric in several places, including in the block of code where it tries to create the cd_concept_sets, here:

CREATE TABLE @results_schema.@concept_sets (
            cohort_id BIGINT NOT NULL,
            **concept_set_id INT NOT NULL,**
            concept_set_sql VARCHAR NOT NULL,
            concept_set_name VARCHAR(255) NOT NULL,
            concept_set_expression VARCHAR NOT NULL,
            PRIMARY KEY(cohort_id, concept_set_id)
);

Perhaps you could try to change concept_set_id to numeric throughout the sql, so it doesn't error?

chrisknoll commented 4 months ago

Error in formatChunk(pkValuesInDb = env$primaryKeyValuesInDb, chunk = chunk) : concept_set_id is of type numeric which cannot be converted between data frames pkValuesInDb and chunk

Actually, I don't think this is at the DB level, it's an R issue. I believe this was a bugfix in ResultModelManager: https://github.com/OHDSI/ResultModelManager/blob/main/NEWS.md

ResultModelManager 0.5.7
Bug fixes:

Added type conversion checking on upload of data where columns are characters but interpreted as numeric from reading inserted data 

Check your version of ResultModelManager that is installed, and try to update to at least 0.5.7.

filipmaljkovic commented 4 months ago

@milou-brand

I might be daft, but all the tables with the stemi_v2 prefix are the following:

cdm=# \dt results.*
...
 results | stemi_v2                                                 | table | mega
 results | stemi_v2_censor_stats                                    | table | mega
 results | stemi_v2_inclusion                                       | table | mega
 results | stemi_v2_inclusion_result                                | table | mega
 results | stemi_v2_inclusion_stats                                 | table | mega
 results | stemi_v2_summary_stats                                   | table | mega
...

I thought that the concept sets table was temporary. I don't have any concept set table in the results schema.

@chrisknoll

> packageVersion("ResultModelManager")
[1] ‘0.5.7’

Seems I have the proper version (which is no surprise, since I installed everything from this package quite recently, and 0.5.7 was released in early May).

milou-brand commented 4 months ago

@filipmaljkovic just checking the results_compile will compile the results from the results_folder in the outputCHCZ folder that you shared in a screenshot above, there should be three subfolders (DatabaseMetaData, CohortGeneratorModule_1, CohortDiagnosticsModule_2). Can you confirm there are csv files in these folders? This is essentially all the data output, the results_compile just wraps everything into one sqlite file.

filipmaljkovic commented 4 months ago

@milou-brand Yes:

Everything in the Strategus execution seems to have completed successfully. app/CHCZ.sqlite is even 464KB large.

milou-brand commented 4 months ago

That does sound weird, especially since you have already have the newest resultsmodelmanager installed. Essentially those csv files is all the output I need for the study, I can try and see if I can run results_compile from my side? Otherwise happy to jump on a quick call if that's easier? My email is milou.brand@iqvia.com

filipmaljkovic commented 4 months ago

@milou-brand and I had a call and we tried many things (downgrade / upgrade of ResultModelManager, changing the cd_concept_sets.csv in various ways) and nothing worked, until we changed the CohortDiagnostics library code. Namely, we changed "CohortDiagnostics/sql/sql_server/CreateResultsDataModel.sql" and replaced INT with VARCHAR for:

Only then was I able to successfully run the Results_compile.R.

filipmaljkovic commented 4 months ago

I managed to run the app.R (although the current version of the code accesses the sqlite db file in the app, and shouldn't actually be moved to the app folder, or if you move the file to the app folder, then line 18 should read sqliteDbPath <- file.path(paste0(connectionDetailsReference, ".sqlite")))

The app opens, but I get the error in question when trying to do cohort diagnostics. The CLI log shows only: Failed to load module CohortDiagnostics

image

milou-brand commented 4 months ago

Seems like now it's struggling with our VARCHAR input as it seems to fail on timeId, which if my memory is correct is one of the ones that we potentially changed.

I did some more searching into the sqlite sql, could try changing the concept_sets sql to cast(_cohortid, i.e. 36/37 as int)? I'm not sure if that will work, but it's worth the try?

filipmaljkovic commented 4 months ago

Yes, you're right. After changing the VARCHAR columns to INTEGER (using SQLite IDE), CohortDiagnostics works.

Since all the problems have been solved, I'm closing the issue.