massivetexts / compare-tools

6 stars 0 forks source link

Self-similarity matrix for translations #9

Open organisciak opened 4 years ago

organisciak commented 4 years ago

Idea from @bmschmidt to explore - can the self-similarity matrix for page-to-page or chunk-to-chunk comparisons be a comparable artifact for translations of books?

bmschmidt commented 4 years ago

I have no answer. But I keep mulling this over, because it's actually related to a deeper problem it would be nice to solve. Which is:

Any two matrices (chunk-to-chunk, page-to-page) are an m-by-n dimensional comparison of shape that varies enormously from set to set. Is there some useful way to embed those many different types of relationships down into a fixed-length vector space so that we can say "These two books have the same relation?" We're currently doing a whole bunch of selective feature picking--but it might be possible to use a representation learning approach, instead.