Open bgcarlisle opened 1 year ago
Dear all,
I have made the aforementioned tests on two different samples:
Sample 1 (31 trials from Samruddhi) Sample 2 (1000 interventional trials started between 21-03-01 and 21-03-31)
Check the files out yourself, if you like; I think they explain themselves! The last nine columns show the matching based on different strdist settings (1 to 9). The bottom line is: Matching based on a strdist of 7 is perfectly fine, but 6 would work equally well!
Best, Martin
Hi Martin, Thats wonderful. I see that at Strdist 6 we get a single NA and it is consistent with the next Strdist. I am fine with either of these.
On another note, the trial NCT05080829 https://clinicaltrials.gov/study/NCT05080829?term=NCT05080829&rank=1has no assignment of high-level MeSH terms at every Strdist even though it fetches downcase mesh terms from ClinicalTrials.gov (leukemia, myelogenous, chronic, bcr-abl positive). According to MeSH tree https://www.ncbi.nlm.nih.gov/mesh/?term=leukemia%2C+myelogenous%2C, it should be assigned under "neoplasm" under leukemia. Higher strdist also failed to fetch foci for this trial.
Overall, strdist 6 and onwards, accommodates the majority of trials but we still might miss some.
On Fri, Oct 27, 2023 at 11:08 PM Martin Holst @.***> wrote:
Dear all,
I have made the tests on two different samples:
Sample 1 https://charitede.sharepoint.com/:x:/r/sites/ClinicalResearchAGStrech-RTrialFociMapper/Shared%20Documents/R%20TrialFociMapper/data/matching_algorithm_test_a.csv?d=wf0e1ecc03c8b4151a85073a674c6d5f9&csf=1&web=1&e=fMI28V (from Samruddhi) Sample 2 https://charitede.sharepoint.com/:x:/r/sites/ClinicalResearchAGStrech-RTrialFociMapper/Shared%20Documents/R%20TrialFociMapper/data/%5Bmatching_algorithm_test_b.csv%5D(https://charitede.sharepoint.com/:x:/r/sites/ClinicalResearchAGStrech-RTrialFociMapper/Shared%20Documents/R%20TrialFociMapper/data/matching_algorithm_test_b.csv?d=w447272bff5864ef39d4c1b5cf7666067&csf=1&web=1&e=mqwqmg)?d=w447272bff5864ef39d4c1b5cf7666067&csf=1&web=1&e=mqwqmg (1000 interventional trials started between 21-03-01 and 21-03-31):
Check the files out yourself, if you like! Bottom line: Matching based on a strdist of 7 is perfectly fine, but 6 would work equally well!
Best, Martin
— Reply to this email directly, view it on GitHub https://github.com/sama9767/TrialFociMapper/issues/2#issuecomment-1783514603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZDJRAF5VHWKDVASPNCEWOTYBQPF7AVCNFSM6AAAAAA5V2V7LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTGUYTINRQGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hey @sama9767 and @bgcarlisle, I looked into this issue. The problem is that the matching algorithm uses lowercase MeSH terms from the AACT database, but uppercase MeSH terms from the MeSH trees.
Since the term 'Leukemia, Myelogenous, Chronic, BCR-ABL Positive' has many uppercase letters, it only matches at a strdist of 10.
I guess the easiest solution would be to fetch the uppercase MeSH terms from the AACT database and use those. In that case, I believe a strdist of 1 would work perfectly, because the names should be the same.
Shall I implement this and run another test?
That seems like a good idea
You might pass both strings to be compared through stringr::str_to_upper()
just to be sure
Hi @Martin-R-H , @bgcarlisle , you got to the bottom, nice! Can we make the mesh tree downcase and match it with downcase mesh terms from the AACT database? The non-downcase MeSH terms from the AACT database are cleaned by AACT personnel, but they mention there might be some room for improvement there as data submitters have inconsistently submitted the mesh terms in ctgov.
I reran the tests, with lowercase MeSH terms. Now, all MeSH terms match, even at a strdist of 1.
Hi Martin, Hi Murph,
I am sorry, but I am seeing these emails now. Wow, that worked. I am unable to see the file on my end. But could you make a pull request, so I can get all the updated code and rerun everything?
On Sun, Nov 12, 2023, 11:13 AM Martin Holst @.***> wrote:
I reran the tests, with lowercase MeSH terms. Now, all MeSH terms match, even at a strdist of 1.
— Reply to this email directly, view it on GitHub https://github.com/sama9767/TrialFociMapper/issues/2#issuecomment-1807080461, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZDJRACI7KUEKBMJ2DTHNDLYECONZAVCNFSM6AAAAAA5V2V7LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBXGA4DANBWGE . You are receiving this because you were mentioned.Message ID: @.***>
Dear @sama9767,
sure, I just opened a pull request! Of course, you can also switch branches in Git and run the tests from my development branch... :)
Sorry you cannot open the file! You can find it on Teams, under R TrialFociMapper / data / matching_algorithm_test_revised.csv