lingdb / Sound-Comparisons

Exploring phonetic diversity across language families —
http://www.soundcomparisons.com
Other
13 stars 8 forks source link

Delete Legacy Duplicate Transcription Records #309

Open PaulHeggarty opened 8 years ago

PaulHeggarty commented 8 years ago

Many of these may be fixed by now, but we really have to finish this entirely and close the issue, by completing the deletion/merger of Alternative phonetic transcription Index values set to 0 or 1. Another example has now arisen, the first below:

This is one for @Bibiko and @PaulHeggarty to fix together. Old ones are generally grey, but still appear, for some languages, particularly in the Andes and other languages where I did early transcriptions (eg Penza Russian?).

Mapudungun-SQL.zip

It may just be a question of just deleting records by an appropriate SQL search.

PaulHeggarty commented 6 years ago

Digging just now, it is a problem of duplication between the Mapudungun and Andean studies. The latter had old transcriptions that have not been deleted. We will now do this, and the problem should be fixed.

PaulHeggarty commented 6 years ago

This problem arose a month or so ago with the new and better system implemented by @Bibiko so that studies and sub-studies that overlap do not need separate transcription records. This was correct, much better, and worked. However, the old overlapping records (e.g. in Mapudungun and in Andean; in Englishes and Germanic) were not deleted as they should have been. We have now deleted them, and this legacy problem has basically been solved. There may be a few more leftover legacy records, as per the unfinished check boxes above, which we will continue to investigate.

PaulHeggarty commented 6 years ago

Solving this problem by deleting old legacy duplicated records has also in effect solved #469, see there for explanation.

PaulHeggarty commented 6 years ago

The main problem was duplicate records in the Transcription tables for a study and one of its sub-studies. These have mostly been fixed.

There are some outstanding cases, however, for which the cause is probably different: old transcription records with a 1 rather than a 0 for the "AlternativePhoneticRealisationIx" field. This field should always be 0 (or 2 or 3), but never 1. So first we need a count, in the Andean Transcriptions table, for example, of cases where AlternativePhoneticRealisationIx = 1. And in how many of those, it stands alongside another record identical except for AlternativePhoneticRealisationIx = 0 (and perhaps a different transcription). Then I will advise on what to do with them, and which to delete.

Bibiko commented 6 years ago

[1] for Limburgish -- no, these double transcriptions are stored within the Transcriptions_Germanic table via AlternativePhoneticRealisationIx = 2 but without underlying sound file.