Delete Legacy Duplicate Transcription Records

PaulHeggarty commented 8 years ago

Many of these may be fixed by now, but we really have to finish this entirely and close the issue, by completing the deletion/merger of Alternative phonetic transcription Index values set to 0 or 1. Another example has now arisen, the first below:

[ ] http://www.soundcomparisons.com/#/de/Germanic/language/Gmc_W_Eng_Hist_OE_WSx
[ ] http://www.soundcomparisons.com/#/en/Germanic/language/Limburgish This looks like it may be because the Englishes Transcriptions table still includes old duplicates. All transcription records in that file for which the language FullIx number does not start with 1111 (i.e. not varieties of English, but varieties of other Germanic languages) should be deleted from the Englishes Transcriptions table. This is effectively the mirror image of the deletion of old 1111 records within the Germanic folder that we did yesterday.
[ ] http://www.soundcomparisons.com/#/en/Slavic/language/Russian%3A Penza - see for example numbers three and four. This is probably the old data problem because of entries with AlternativePhoneticRealisationIx = 1, near duplicates of correct records with AlternativePhoneticRealisationIx = 0.
[ ] http://www.soundcomparisons.com/#/en/Andean/language/Centre%3A Chimborazo - see for example number four. Again, probably the AlternativePhoneticRealisationIx = 1 legacy problem.
[ ] http://www.soundcomparisons.com/#/en/Romance/language/Gallo%3A Janz%C3%A9 also has some problems still, again probably because of AlternativePhoneticRealisationIx = 1.
[x] LOADS of Mapudungun languages, e.g. http://www.soundcomparisons.com/#/en/Mapudungun/language/Dollinco. These had been duplicated in the Mapudungun and Andean transcription tables, and the records not removed from Andean when the new 'find in any' system was introduced. The duplicates were deleted from the Andean transcription table on 2018 02 01, and this problem is essentially solved.

This is one for @Bibiko and @PaulHeggarty to fix together. Old ones are generally grey, but still appear, for some languages, particularly in the Andes and other languages where I did early transcriptions (eg Penza Russian?).

Mapudungun-SQL.zip

It may just be a question of just deleting records by an appropriate SQL search.

PaulHeggarty commented 6 years ago

Digging just now, it is a problem of duplication between the Mapudungun and Andean studies. The latter had old transcriptions that have not been deleted. We will now do this, and the problem should be fixed.

PaulHeggarty commented 6 years ago

This problem arose a month or so ago with the new and better system implemented by @Bibiko so that studies and sub-studies that overlap do not need separate transcription records. This was correct, much better, and worked. However, the old overlapping records (e.g. in Mapudungun and in Andean; in Englishes and Germanic) were not deleted as they should have been. We have now deleted them, and this legacy problem has basically been solved. There may be a few more leftover legacy records, as per the unfinished check boxes above, which we will continue to investigate.

PaulHeggarty commented 6 years ago

Solving this problem by deleting old legacy duplicated records has also in effect solved #469, see there for explanation.

PaulHeggarty commented 6 years ago

The main problem was duplicate records in the Transcription tables for a study and one of its sub-studies. These have mostly been fixed.

There are some outstanding cases, however, for which the cause is probably different: old transcription records with a 1 rather than a 0 for the "AlternativePhoneticRealisationIx" field. This field should always be 0 (or 2 or 3), but never 1. So first we need a count, in the Andean Transcriptions table, for example, of cases where AlternativePhoneticRealisationIx = 1. And in how many of those, it stands alongside another record identical except for AlternativePhoneticRealisationIx = 0 (and perhaps a different transcription). Then I will advise on what to do with them, and which to delete.

Bibiko commented 6 years ago

[1] for Limburgish -- no, these double transcriptions are stored within the Transcriptions_Germanic table via AlternativePhoneticRealisationIx = 2 but without underlying sound file.

lingdb / Sound-Comparisons

Delete Legacy Duplicate Transcription Records #309