ropensci / googleLanguageR

R client for the Google Translation API, Google Cloud Natural Language API and Google Cloud Speech API
https://code.markedmondson.me/googleLanguageR/
Other
194 stars 42 forks source link

gl_speech output is incomplete #37

Closed kgarnick closed 6 years ago

kgarnick commented 6 years ago

Hi Mark,

Having a bit of a strange issue -- I can get gl_speech to run with the code below, but it seems to cut the transcript short.:

library(googleLanguageR) library(tuneR)

a <- readWave("OSR_us_000_0018_8k.wav", from = 0, to = 1, units = "minutes")
b <- mono(a)
writeWave(b, "OSR_us_000_0018_8k_new.wav", extensible = FALSE)
text <- gl_speech("OSR_us_000_0018_8k_new.wav", sampleRateHertz = b@samp.rate)

The resulting transcript is correct, but only represents the first ~6 seconds of the new wav file. I've listened to the new file, and it contains speech for at least 30 seconds. I can replicate this issue with a different wav file. Any insight? Thanks!

MarkEdmondson1234 commented 6 years ago

Could you try with the GitHub version if you haven't already? There was this issue, but perhaps not related https://github.com/ropensci/googleLanguageR/issues/23

kgarnick commented 6 years ago

Yes, I'm running it the version installed via devtools::install_github("MarkEdmondson1234/googleLanguageR")

MarkEdmondson1234 commented 6 years ago

Ok second reaction is perhaps the audio preprocessing is doing unexpected things, so you can play it but thats not what the API is seeing. Would you have a copy of the file I could use to debug?

I think v0.2.0 I should include some audio preprocessing functions to help with this, as it seems like a tricky thing.

kgarnick commented 6 years ago

Yes, it seems likely the preprocessing is the issue. I downloaded the wav from here. Thanks for the quick replies!

kgarnick commented 6 years ago

Hey Mark,

I tested this with a longer wav. I broke it into 30 second chunks and combined the transcript output -- it cuts every 30 second chunk short. I'll do as much as I can to help solve this, but any help is much appreciated!

MarkEdmondson1234 commented 6 years ago

Hmm can I see some code and the exact audio file you are using, since this file (first on the list) transcribes ok.

My code:

gl_speech("OSR_us_000_0010_8k.wav", sampleRateHertz = 8000L)

Which produces:

$transcript
# A tibble: 10 x 2
   transcript                                    confidence
   <chr>                                         <chr>     
 1 the Birch canoes lid on the smooth planks     0.6725168 
 2 " glue the seat to the dark blue background." 0.70406663
 3 " It is easy to tell the death of a well."    0.74633145
 4 " These days of chicken leg as a word dish."  0.6776977 
 5 " Rice is often served in round bowls."       0.6781965 
 6 " Did use of lemon snakes find punch?"        0.7085607 
 7 " The box was down beside the park truck."    0.68749714
 8 " the Hogs of the popcorn and garbage"        0.6751104 
 9 " 4 hours of study work face to us"           0.75836164
10 " a large size in stockings is hard to sell." 0.8373701 

$timings
   startTime endTime        word
1     0.200s  0.700s         the
2     0.700s  0.900s       Birch
3     0.900s  1.500s      canoes
4     1.500s  1.900s         lid
5     1.900s      2s          on
6         2s  2.200s         the
7     2.200s  2.500s      smooth
8     2.500s      3s      planks
9     3.900s  4.500s        glue
10    4.500s  4.800s         the
11    4.800s      5s        seat
12        5s  5.200s          to
13    5.200s  5.200s         the
14    5.200s  5.600s        dark
15    5.600s  5.800s        blue
16    5.800s  6.400s background.
17    7.500s      8s          It
18        8s  8.100s          is
19    8.100s  8.400s        easy
20    8.400s  8.400s          to
21    8.400s  8.700s        tell
22    8.700s  8.800s         the
23    8.800s      9s       death
24        9s  9.100s          of
25    9.100s  9.400s           a
26    9.400s  9.600s       well.
27   10.600s 11.200s       These
28   11.200s 11.200s        days
29   11.200s 11.500s          of
30   11.500s 11.800s     chicken
31   11.800s     12s         leg
32       12s 12.100s          as
33   12.100s 12.200s           a
34   12.200s 12.400s        word
35   12.400s 12.700s       dish.
36       14s 14.400s        Rice
37   14.400s 14.700s          is
38   14.700s     15s       often
39       15s 15.100s      served
40   15.100s 15.500s          in
41   15.500s 15.800s       round
42   15.800s 16.200s      bowls.
43   17.100s 17.500s         Did
44   17.500s 17.800s         use
45   17.800s 17.900s          of
46   17.900s 18.300s       lemon
47   18.300s 18.400s      snakes
48   18.400s     19s        find
49       19s 19.300s      punch?
50   20.200s 20.600s         The
51   20.600s 20.900s         box
52   20.900s     21s         was
53       21s 21.300s        down
54   21.300s 21.700s      beside
55   21.700s 21.800s         the
56   21.800s 22.100s        park
57   22.100s 22.300s      truck.
58   23.500s 23.900s         the
59   23.900s 24.100s        Hogs
60   24.100s 24.300s          of
61   24.300s 24.500s         the
62   24.500s     25s     popcorn
63       25s 25.300s         and
64   25.300s 25.700s     garbage
65   26.900s 27.300s           4
66   27.300s 27.700s       hours
67   27.700s 27.800s          of
68   27.800s     28s       study
69       28s 28.200s        work
70   28.200s 28.500s        face
71   28.500s 28.700s          to
72   28.700s 28.800s          us
73   29.800s 30.100s           a
74   30.100s 30.400s       large
75   30.400s 30.500s        size
76   30.500s 30.900s          in
77   30.900s     31s   stockings
78       31s 31.400s          is
79   31.400s 31.600s        hard
80   31.600s 31.900s          to
81   31.900s 32.100s       sell.
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] googleLanguageR_0.1.1.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16           rstudioapi_0.7         xml2_1.2.0            
 [4] magrittr_1.5           hms_0.4.2              progress_1.1.2.9002   
 [7] R6_2.2.2               rlang_0.2.0            stringr_1.3.0         
[10] httr_1.3.1             tools_3.4.3            utf8_1.1.3            
[13] cli_1.0.0.9002         withr_2.1.1            selectr_0.3-2         
[16] googleAuthR_0.6.2      openssl_0.9.9          yaml_2.1.18           
[19] assertthat_0.2.0       digest_0.6.15          tibble_1.4.2          
[22] crayon_1.3.4           zip_1.0.0              purrr_0.2.4           
[25] base64enc_0.1-3        curl_3.1               memoise_1.1.0         
[28] glue_1.2.0             stringi_1.1.7          compiler_3.4.3        
[31] pillar_1.2.1           prettyunits_1.0.2      ansistrings_1.0.0.9000
[34] jsonlite_1.5           pkgconfig_2.0.1     
kgarnick commented 6 years ago

Sure. Seems to stop after the first line:

library(googleLanguageR) library(tuneR) gl_auth("speech2text-3f89d34ff4a9.json") wav_header <- readWave("OSR_us_000_0010_8k.wav", header = TRUE) transcript <- gl_speech("OSR_us_000_0010_8k.wav", sampleRateHertz = wav_header$sample.rate) transcript$transcript

[1] "the Birch canoes lid on the smooth planks"

transcript$words

[[1]] startTime endTime word 1 0.200s 0.700s the 2 0.700s 0.900s Birch 3 0.900s 1.500s canoes 4 1.500s 1.900s lid 5 1.900s 2s on 6 2s 2.200s the 7 2.200s 2.500s smooth 8 2.500s 3s planks

sessionInfo()

R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tuneR_1.3.2 googleLanguageR_0.0.0.9000

loaded via a namespace (and not attached): [1] Rcpp_0.12.15 lubridate_1.7.2 lattice_0.20-35 tidyr_0.8.0 class_7.3-14 assertthat_0.2.0
[7] digest_0.6.15 ipred_0.9-6 psych_1.7.8 foreach_1.4.4 R6_2.2.2 plyr_1.8.4
[13] signal_0.7-6 stats4_3.4.3 httr_1.3.1 ggplot2_2.2.1 pillar_1.2.0 rlang_0.2.0
[19] curl_3.1 lazyeval_0.2.1 caret_6.0-78 data.table_1.10.4-3 googleAuthR_0.6.2.9000 kernlab_0.9-25
[25] rpart_4.1-11 Matrix_1.2-12 splines_3.4.3 CVST_0.2-1 ddalpha_1.3.1.1 gower_0.1.2
[31] stringr_1.3.0 foreign_0.8-69 munsell_0.4.3 broom_0.4.3 compiler_3.4.3 pkgconfig_2.0.1
[37] base64enc_0.1-3 mnormt_1.5-5 dimRed_0.1.0 openssl_1.0 nnet_7.3-12 tidyselect_0.2.3
[43] tibble_1.4.2 prodlim_1.6.1 DRR_0.0.3 codetools_0.2-15 RcppRoll_0.2.2 dplyr_0.7.4
[49] withr_2.1.1 MASS_7.3-47 recipes_0.1.2 ModelMetrics_1.1.0 grid_3.4.3 nlme_3.1-131
[55] jsonlite_1.5 gtable_0.2.0 magrittr_1.5 scales_0.5.0 stringi_1.1.6 reshape2_1.4.3
[61] bindrcpp_0.2 timeDate_3043.102 robustbase_0.92-8 xgboost_0.6.4.1 lava_1.6 iterators_1.0.9
[67] tools_3.4.3 glue_1.2.0 DEoptimR_1.0-8 purrr_0.2.4 sfsmisc_1.1-1 parallel_3.4.3
[73] survival_2.41-3 yaml_2.1.16 colorspace_1.3-2 memoise_1.1.0 bindr_0.1

MarkEdmondson1234 commented 6 years ago

I think you are installing from the wrong GitHub repo :)

googleLanguageR_0.0.0.9000

This version is

googleLanguageR_0.1.1.9000

Run remotes::install_github("ropensci/googleLanguageR") and not remotes::install_github("MarkEdmondson1234/googleLanguageR") - I'll remove that one ASAP....

kgarnick commented 6 years ago

Perfect! I can't thank you enough for your help and your work on this awesome package.