sama9767 / TrialFociMapper

Retrieves and assigns therapeutic focus to clinical trial
GNU General Public License v3.0
1 stars 1 forks source link

Calibrate amatch using clinical trials #2

Open bgcarlisle opened 1 year ago

bgcarlisle commented 1 year ago
Martin-R-H commented 1 year ago

Dear all,

I have made the aforementioned tests on two different samples:

Sample 1 (31 trials from Samruddhi) Sample 2 (1000 interventional trials started between 21-03-01 and 21-03-31)

Check the files out yourself, if you like; I think they explain themselves! The last nine columns show the matching based on different strdist settings (1 to 9). The bottom line is: Matching based on a strdist of 7 is perfectly fine, but 6 would work equally well!

Best, Martin

sama9767 commented 1 year ago

Hi Martin, Thats wonderful. I see that at Strdist 6 we get a single NA and it is consistent with the next Strdist. I am fine with either of these.

On another note, the trial NCT05080829 https://clinicaltrials.gov/study/NCT05080829?term=NCT05080829&rank=1has no assignment of high-level MeSH terms at every Strdist even though it fetches downcase mesh terms from ClinicalTrials.gov (leukemia, myelogenous, chronic, bcr-abl positive). According to MeSH tree https://www.ncbi.nlm.nih.gov/mesh/?term=leukemia%2C+myelogenous%2C, it should be assigned under "neoplasm" under leukemia. Higher strdist also failed to fetch foci for this trial.

Overall, strdist 6 and onwards, accommodates the majority of trials but we still might miss some.

On Fri, Oct 27, 2023 at 11:08 PM Martin Holst @.***> wrote:

Dear all,

I have made the tests on two different samples:

Sample 1 https://charitede.sharepoint.com/:x:/r/sites/ClinicalResearchAGStrech-RTrialFociMapper/Shared%20Documents/R%20TrialFociMapper/data/matching_algorithm_test_a.csv?d=wf0e1ecc03c8b4151a85073a674c6d5f9&csf=1&web=1&e=fMI28V (from Samruddhi) Sample 2 https://charitede.sharepoint.com/:x:/r/sites/ClinicalResearchAGStrech-RTrialFociMapper/Shared%20Documents/R%20TrialFociMapper/data/%5Bmatching_algorithm_test_b.csv%5D(https://charitede.sharepoint.com/:x:/r/sites/ClinicalResearchAGStrech-RTrialFociMapper/Shared%20Documents/R%20TrialFociMapper/data/matching_algorithm_test_b.csv?d=w447272bff5864ef39d4c1b5cf7666067&csf=1&web=1&e=mqwqmg)?d=w447272bff5864ef39d4c1b5cf7666067&csf=1&web=1&e=mqwqmg (1000 interventional trials started between 21-03-01 and 21-03-31):

Check the files out yourself, if you like! Bottom line: Matching based on a strdist of 7 is perfectly fine, but 6 would work equally well!

Best, Martin

— Reply to this email directly, view it on GitHub https://github.com/sama9767/TrialFociMapper/issues/2#issuecomment-1783514603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZDJRAF5VHWKDVASPNCEWOTYBQPF7AVCNFSM6AAAAAA5V2V7LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTGUYTINRQGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Martin-R-H commented 1 year ago

Hey @sama9767 and @bgcarlisle, I looked into this issue. The problem is that the matching algorithm uses lowercase MeSH terms from the AACT database, but uppercase MeSH terms from the MeSH trees.

Since the term 'Leukemia, Myelogenous, Chronic, BCR-ABL Positive' has many uppercase letters, it only matches at a strdist of 10.

I guess the easiest solution would be to fetch the uppercase MeSH terms from the AACT database and use those. In that case, I believe a strdist of 1 would work perfectly, because the names should be the same.

Shall I implement this and run another test?

bgcarlisle commented 1 year ago

That seems like a good idea

You might pass both strings to be compared through stringr::str_to_upper() just to be sure

sama9767 commented 1 year ago

Hi @Martin-R-H , @bgcarlisle , you got to the bottom, nice! Can we make the mesh tree downcase and match it with downcase mesh terms from the AACT database? The non-downcase MeSH terms from the AACT database are cleaned by AACT personnel, but they mention there might be some room for improvement there as data submitters have inconsistently submitted the mesh terms in ctgov.

Martin-R-H commented 1 year ago

I reran the tests, with lowercase MeSH terms. Now, all MeSH terms match, even at a strdist of 1.

See the file.

sama9767 commented 1 year ago

Hi Martin, Hi Murph,

I am sorry, but I am seeing these emails now. Wow, that worked. I am unable to see the file on my end. But could you make a pull request, so I can get all the updated code and rerun everything?

On Sun, Nov 12, 2023, 11:13 AM Martin Holst @.***> wrote:

I reran the tests, with lowercase MeSH terms. Now, all MeSH terms match, even at a strdist of 1.

See the file https://charitede.sharepoint.com/:x:/r/sites/ClinicalResearchAGStrech-RTrialFociMapper/Shared%20Documents/R%20TrialFociMapper/data/%5Bmatching_algorithm_test_revised.csv%5D(https://charitede.sharepoint.com/:x:/r/sites/ClinicalResearchAGStrech-RTrialFociMapper/Shared%20Documents/R%20TrialFociMapper/data/matching_algorithm_test_revised.csv?d=w7c5ff977f5964922924a0bea6fdb322f&csf=1&web=1&e=mbWLfX)?d=w7c5ff977f5964922924a0bea6fdb322f&csf=1&web=1&e=mbWLfX .

— Reply to this email directly, view it on GitHub https://github.com/sama9767/TrialFociMapper/issues/2#issuecomment-1807080461, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZDJRACI7KUEKBMJ2DTHNDLYECONZAVCNFSM6AAAAAA5V2V7LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBXGA4DANBWGE . You are receiving this because you were mentioned.Message ID: @.***>

Martin-R-H commented 1 year ago

Dear @sama9767,

sure, I just opened a pull request! Of course, you can also switch branches in Git and run the tests from my development branch... :)

Sorry you cannot open the file! You can find it on Teams, under R TrialFociMapper / data / matching_algorithm_test_revised.csv