j-hagedorn / trilogy

Reference datasets for folktale motifs, tale types, and annotated texts
Other
5 stars 2 forks source link

Add biclustering example use case #17

Closed j-hagedorn closed 8 months ago

j-hagedorn commented 3 years ago

Per request from @sdaranyi : "2-way clustering (biclustering, block clustering, co-clustering) on the AFT for first results. I know it exists in R because I used it a few years ago but it would be killing to find the relevant knowhow in my files right now. Its output is a heatmap, or it should be combined with one, the point being that we could immediately smuggle in the ATU thereby, without much ado. It would be nice to create a visual to reproduce the ATU contents, separating animal tales, magical tales etc into separate boxes so that eg these categories label the rows and the respective ATU numbers flagging texts label the columns (or vice versa, with more texts than categories). I'm not sure if the figure will make much sense for us, but for others not being familiar with this technique, it will. Plus herefrom we could move ahead toward text analysis vs motif string based block clustering in pursuit of motifs etc. So to say, an exemplification of the possibilities for future users with the willingness but no idea of what next."

j-hagedorn commented 3 years ago

R package reference:

sdaranyi commented 2 years ago

Hi Josh,

Parallel to your request to Stratos to delimit the scope of the article he wants to see, could you please run the above on the AFT for me? The Fabula deadline is in a month, and the period ahead is becoming rather strenuous for me. I hope I don't ask for too much of your time.

The preliminary purpose would be subcorpus identification/visualisation. The 182 belong to the 8 major classes to a different extent, something you already started to hint at by means of an early treemap. However, neither that in its final form nor the finer microstructure are known, whereas they could provide a great overview of what to expect, where to dig, and so on. Plus, to this I would have references, but will need time to elaborate, hence the timing of my plea.

Many thanks in advance!

Sándor

On Sun, 21 Feb 2021 at 18:04, Joshh @.***> wrote:

Per request from @sdaranyi https://github.com/sdaranyi : "2-way clustering (biclustering, block clustering, co-clustering) on the AFT for first results. I know it exists in R because I used it a few years ago but it would be killing to find the relevant knowhow in my files right now. Its output is a heatmap, or it should be combined with one, the point being that we could immediately smuggle in the ATU thereby, without much ado. It would be nice to create a visual to reproduce the ATU contents, separating animal tales, magical tales etc into separate boxes so that eg these categories label the rows and the respective ATU numbers flagging texts label the columns (or vice versa, with more texts than categories). I'm not sure if the figure will make much sense for us, but for others not being familiar with this technique, it will. Plus herefrom we could move ahead toward text analysis vs motif string based block clustering in pursuit of motifs etc. So to say, an exemplification of the possibilities for future users with the willingness but no idea of what next."

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/j-hagedorn/trilogy/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZDKNSYQJMMX3OZN342M7DTAE4LPANCNFSM4X7FIMAA .

j-hagedorn commented 8 months ago

@sdaranyi and @salmonix, since we have agreed that this repository will be dedicated to the cleaning and publication of canonical datasets, and not for analyses of those datasets, I'm closing this issue here. I can create a new repo to support our analysis of the data, and we can include biclustering as needed.

sdaranyi commented 8 months ago

Agreed.

On Mon, 8 Jan 2024 at 08:28, Joshh @.***> wrote:

@sdaranyi https://github.com/sdaranyi and @salmonix https://github.com/salmonix, since we have agreed that this repository will be dedicated to the cleaning and publication of canonical datasets, and not for analyses of those datasets, I'm closing this issue here. I can create a new repo to support our analysis of the data, and we can include biclustering as needed.

— Reply to this email directly, view it on GitHub https://github.com/j-hagedorn/trilogy/issues/17#issuecomment-1880497165, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZDKNQYI3II2OFGX2KX5O3YNON23AVCNFSM4X7FIMAKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBYGA2DSNZRGY2Q . You are receiving this because you were mentioned.Message ID: @.***>