nzilbb / labbcat-R

R package for accessing LaBB-CAT functionality
GNU General Public License v3.0
5 stars 2 forks source link

processWithPraat() doesn't return the same as LaBB-CAT GUI #20

Closed djvill closed 2 years ago

djvill commented 2 years ago

I noticed that processWithPraat() returns different values than LaBB-CAT's GUI upload > process with praat page. I've tested it with formants and pitch. The formants output shows that processWithPraat() is picking out a different midpoint than the GUI:

labbcat.url <- "https://labbcat.canterbury.ac.nz/demo/"
results <- getMatches(labbcat.url, list(segment="I"), matches.per.transcript=1)
formants_R <- processWithPraat(labbcat.url, 
                               results$MatchId, results$Target.segment.start, results$Target.segment.end,
                               praatScriptFormants())
##Assume GUI version is downloaded as tmp.csv
formants_GUI <- read.csv("tmp.csv")

identical(nrow(formants_R), nrow(formants_GUI))
# [1] TRUE

head(formants_R[,-4])
#   time_0_5 f1_time_0_5 f2_time_0_5
# 1    3.590         421        1927
# 2    3.145         317        1877
# 3    2.795         485        1199
# 4    2.275         576        1713
# 5    3.978         430        2273
# 6    9.950         514        1566

head(formants_GUI[,c("time_0.5", "F1.time_0.5", "F2.time_0.5")])
#   time_0.5 F1.time_0.5 F2.time_0.5
# 1    3.635         363        1894
# 2    2.920         368        1024
# 3    2.795         458        1171
# 4    2.455         485        1261
# 5    3.793         379        1836
# 6   10.415         655        1909

Unfortunately, the offset isn't consistent:

summary(formants_GUI$time_0.5 - formants_R$time_0_5)
#      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
# -0.225000 -0.085000  0.000000 -0.003393  0.045000  0.465000 
robertfromont commented 2 years ago

Can you include a listing of Target.segment.start and Target.segment.end in both results and formants_GUI?

djvill commented 2 years ago

Yup! They're equal:

all.equal(results$Target.segment.start, formants_GUI$Target.segment.start)
# [1] TRUE

all.equal(results$Target.segment.end, formants_GUI$Target.segment.end)
# [1] TRUE

results$Target.segment.start
#  [1]  3.530  3.070  2.780  2.250  3.963  9.920  3.300
#  [8]  7.750  2.210  2.550  3.860  4.050  2.600 10.130
# [15]  3.110  4.540  1.190  3.300  4.290  5.620  8.620
# [22] 12.020  6.580  1.700 16.450  6.620 16.000  2.970

results$Target.segment.end
#  [1]  3.650  3.220  2.810  2.300  3.993  9.980  3.340
#  [8]  7.780  2.290  2.580  3.890  4.090  2.670 10.190
# [15]  3.140  4.630  1.280  3.340  4.380  5.750  8.670
# [22] 12.060  6.620  1.750 16.480  6.680 16.030  3.000

formants_GUI$Target.segment.start
#  [1]  3.530  3.070  2.780  2.250  3.963  9.920  3.300
#  [8]  7.750  2.210  2.550  3.860  4.050  2.600 10.130
# [15]  3.110  4.540  1.190  3.300  4.290  5.620  8.620
# [22] 12.020  6.580  1.700 16.450  6.620 16.000  2.970

formants_GUI$Target.segment.end
#  [1]  3.650  3.220  2.810  2.300  3.993  9.980  3.340
#  [8]  7.780  2.290  2.580  3.890  4.090  2.670 10.190
# [15]  3.140  4.630  1.280  3.340  4.380  5.750  8.670
# [22] 12.060  6.620  1.750 16.480  6.680 16.030  3.000
robertfromont commented 2 years ago

Thanks. The correct midpoint is the one processWithPraat() - e.g. the midpoint between 3.530 and 3.650 is ( 3.530 + ((3.650−3.530)÷2) = ) 3.59 which is what your formants_R dataframe has.

So I'll have a look at what's going on in the GUI...

robertfromont commented 2 years ago

Just trying to reproduce this, it looks like formants_GUI$time_0.5 is the midpoint of the word rather than the segment.

You would get this if you selected the word start/end columns in the GUI: Target word start and Target word end selected

...instead of the segment start/end columns: Target segment start and Target segment end selected

djvill commented 2 years ago

🤦‍♂️