Group items into new harmonised variables

Description

can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool.

See mockup:

https://github.com/harmonydata/hackathon/blob/main/find_variable.png

Rationale

Users have requested this feature. Because there is otherwise a manual step going from the similarity matrix (which is currently in the export ) to harmonised variables.

Sure is that grouping in the API data somewhere? Would be great to add it in and offer filtering by it. Or is this the topics_auto / topics_strengths field to be leveraged here?

J

John Rogers Delosis Ltd

On 31 May 2024, at 18:04, Thomas Wood @.***> wrote:

Description

can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool.

See mockup:

https://github.com/harmonydata/hackathon/blob/main/find_variable.png

Rationale

Users have requested this feature. Because there is otherwise a manual step going from the similarity matrix (which is currently in the export ) to harmonised variables.

— Reply to this email directly, view it on GitHub https://github.com/harmonydata/app/issues/22, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFKMOW5T5SVZZ5ORYFPSX6LZFCNQHAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDQMJVGQ4TANY. You are receiving this because you are subscribed to this thread.

I was thinking maybe the front end can apply some simple deterministic logic to make the groupings using the similarity matrix as an input. We cannot use clustering algorithms because they are slow and also not reproducible.

E.g. we set a threshold for what level of similarity constitutes a group. Maybe 60%. Then a group (which then would become a single variable such as "height" or "anxiety" in the researcher's meta analysis) could be either (a) a set of items from the original questionnaires where they all have similarity above 60% to all other members of the set, or (b) a set of items where each one is connected to at least one other member of the set by similarity > 60%.

But maybe that logic is better put in the API? But if the logic is simple enough we can do it in the FE which might allow faster iteration of how it's done.

On Fri, 31 May 2024, 17:30 ronnyTodgers, @.***> wrote:

Sure is that grouping in the API data somewhere? Would be great to add it in and offer filtering by it. Or is this the topics_auto / topics_strengths field to be leveraged here?

J

John Rogers Delosis Ltd

On 31 May 2024, at 18:04, Thomas Wood @.***> wrote:

Description

can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool.

See mockup:

https://github.com/harmonydata/hackathon/blob/main/find_variable.png

Rationale

Users have requested this feature. Because there is otherwise a manual step going from the similarity matrix (which is currently in the export ) to harmonised variables.

— Reply to this email directly, view it on GitHub < https://github.com/harmonydata/app/issues/22>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AFKMOW5T5SVZZ5ORYFPSX6LZFCNQHAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDQMJVGQ4TANY>.

You are receiving this because you are subscribed to this thread.

— Reply to this email directly, view it on GitHub https://github.com/harmonydata/app/issues/22#issuecomment-2142609205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUBTVMNOAX73XWUCRHWFBTZFCQS3AVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGYYDSMRQGU . You are receiving this because you authored the thread.Message ID: @.***>

To be more clear whatever the threshold is, would ideally be a slider. So that would be a reason to do the groups in the FE

On Fri, 31 May 2024, 17:37 Thomas Wood, @.***> wrote:

I was thinking maybe the front end can apply some simple deterministic logic to make the groupings using the similarity matrix as an input. We cannot use clustering algorithms because they are slow and also not reproducible.

E.g. we set a threshold for what level of similarity constitutes a group. Maybe 60%. Then a group (which then would become a single variable such as "height" or "anxiety" in the researcher's meta analysis) could be either (a) a set of items from the original questionnaires where they all have similarity above 60% to all other members of the set, or (b) a set of items where each one is connected to at least one other member of the set by similarity > 60%.

But maybe that logic is better put in the API? But if the logic is simple enough we can do it in the FE which might allow faster iteration of how it's done.

On Fri, 31 May 2024, 17:30 ronnyTodgers, @.***> wrote:

Sure is that grouping in the API data somewhere? Would be great to add it in and offer filtering by it. Or is this the topics_auto / topics_strengths field to be leveraged here?

J

John Rogers Delosis Ltd

On 31 May 2024, at 18:04, Thomas Wood @.***> wrote:

Description

can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool.

See mockup:

https://github.com/harmonydata/hackathon/blob/main/find_variable.png

Rationale

Users have requested this feature. Because there is otherwise a manual step going from the similarity matrix (which is currently in the export ) to harmonised variables.

— Reply to this email directly, view it on GitHub < https://github.com/harmonydata/app/issues/22>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AFKMOW5T5SVZZ5ORYFPSX6LZFCNQHAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDQMJVGQ4TANY>.

You are receiving this because you are subscribed to this thread.

— Reply to this email directly, view it on GitHub https://github.com/harmonydata/app/issues/22#issuecomment-2142609205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUBTVMNOAX73XWUCRHWFBTZFCQS3AVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGYYDSMRQGU . You are receiving this because you authored the thread.Message ID: @.***>

Well we could certainly split up the groups of variables and get them to add a label if one is not obvious from the items ( we could look for common words, common related topics perhaps. What do we do with items that fit intomultipl groups, is a group defined only when all items meet threshold with all other items?

Fixed the delay problem and thats all live on the main site now. J

John Rogers Delosis Ltd

On 31 May 2024, at 18:39, Thomas Wood @.***> wrote:

I was thinking maybe the front end can apply some simple deterministic logic to make the groupings using the similarity matrix as an input. We cannot use clustering algorithms because they are slow and also not reproducible.

E.g. we set a threshold for what level of similarity constitutes a group. Maybe 60%. Then a group (which then would become a single variable such as "height" or "anxiety" in the researcher's meta analysis) could be either (a) a set of items from the original questionnaires where they all have similarity above 60% to all other members of the set, or (b) a set of items where each one is connected to at least one other member of the set by similarity > 60%.

But maybe that logic is better put in the API? But if the logic is simple enough we can do it in the FE which might allow faster iteration of how it's done.

On Fri, 31 May 2024, 17:30 ronnyTodgers, @.***> wrote:

Sure is that grouping in the API data somewhere? Would be great to add it in and offer filtering by it. Or is this the topics_auto / topics_strengths field to be leveraged here?

J

John Rogers Delosis Ltd

On 31 May 2024, at 18:04, Thomas Wood @.***> wrote:

Description

can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool.

See mockup:

https://github.com/harmonydata/hackathon/blob/main/find_variable.png

Rationale

Users have requested this feature. Because there is otherwise a manual step going from the similarity matrix (which is currently in the export ) to harmonised variables.

— Reply to this email directly, view it on GitHub < https://github.com/harmonydata/app/issues/22>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AFKMOW5T5SVZZ5ORYFPSX6LZFCNQHAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDQMJVGQ4TANY>.

You are receiving this because you are subscribed to this thread.

— Reply to this email directly, view it on GitHub https://github.com/harmonydata/app/issues/22#issuecomment-2142609205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUBTVMNOAX73XWUCRHWFBTZFCQS3AVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGYYDSMRQGU . You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/harmonydata/app/issues/22#issuecomment-2142622010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFKMOW3H7MQAZMBFKJI4ELTZFCRTDAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGYZDEMBRGA. You are receiving this because you commented.

harmonydata / app