lingdb / Sound-Comparisons

Exploring phonetic diversity across language families —
http://www.soundcomparisons.com
Other
13 stars 8 forks source link

Showing (and colouring by) Cognacy #467

Open PaulHeggarty opened 6 years ago

PaulHeggarty commented 6 years ago

This is to open this issue to get us started on it, and have the colouring working at least by 23rd January for the filming.

The main obstacle to overcome is that Sound Comparisons was always devised to use only known cognates in the first place, so any missing cognates/alternative lexemes were more seen as unfortunate gaps in the data, and so were not dealt with in much detail.

For new studies like Brazil, or Indo-European in general, and for integrating with CoBL, Edictor and other CLLD apps that deal with cognacy, we should move to a system as straightforwardly linked with those as possible. This may mean correspondence tables, rather than a simple 'cognacy number' field for each lexeme record.

The current values we have that are related to cognacy status are:

Neither of these is a true cognate set index, so we should presumably create a new field for that, and use the existing fields (rarely filled with data anyway) to work automatically on all cases that are already in the database.

One thing to maybe get started on is a toggle switch to colour the speech bubbles either by language group (as now) or by cognacy (as desired in some cases).

LinguList commented 6 years ago

Coloring, if many languages are involved, is NOT trivial. On a geographic map, it would involve that you need to apply a graph coloring algorithm or a similar approach (as you don't want to bomb people with 20 and more colors), and those are known to be "hard".

What I usually do is: alternate coloring in tables (if the cognate ID changes), and on a map, you could think about something like this:

what you see there is a simple mechanism, by which clicking on a node, all it's related nodes will be displayed. This is doable in plain javascript (as is the whole application you see there). So imagine to give up the idea of coloring on a map, but instead think of simpler interactive ways to do it.

xrotwang commented 6 years ago

I guess I'm a nuisance, but anyway, I'll repeat myself: Rather than spending effort on the current php codebase, I'd like us to work on implementing soundcomparisons-like functionality as clld plugin, preferably working on top of many CLDF datasets. Adding soundcomparisons functionality (or data) to django-cobl and the other way round seems wasteful to me.

xrotwang commented 6 years ago

@Bibiko what do you think about soundcomparisons on top of clld? Interested in working on it?

LinguList commented 6 years ago

Having recently finally managed to work a bit with CLLD on my own, I can only support this. In fact, if you have things annotated for cognacy, for example, by using EDICTOR, cases like the map example I show, are ultimately trivial to be implemented in CLLD, and you could actually have the alignments on top (which would be beautiful, I guess).

PaulHeggarty commented 6 years ago

This is an immediate priority for filming a clip for a television documentary on Arte, who will be at the institute to film on Tuesday 23rd January. Yes, @xrotwang, you have mentioned this before, but:
(1) What you mention is a long-term aspiration shared by everyone, not just you, so we do know this and don’t need to be reminded. And there are different opinions between us on how best to get there. (2) It has to be balanced against immediate, important goals (specifically requested by Russell in this case), which can be achieved with relatively little investment of programming time, which will in any case be valuable to speed up process (1) in any case.

Within the work pipeline and goals timetable for the whole Sound Comparisons project (and the project manager is in the best position to judge that), there is a continuing, very strong case for keeping on investing programming time in certain targeted ways -- albeit of course always with a view to progressively moving towards the long-term CLLD target.