Closed kgarnick closed 6 years ago
Could you try with the GitHub version if you haven't already? There was this issue, but perhaps not related https://github.com/ropensci/googleLanguageR/issues/23
Yes, I'm running it the version installed via devtools::install_github("MarkEdmondson1234/googleLanguageR")
Ok second reaction is perhaps the audio preprocessing is doing unexpected things, so you can play it but thats not what the API is seeing. Would you have a copy of the file I could use to debug?
I think v0.2.0 I should include some audio preprocessing functions to help with this, as it seems like a tricky thing.
Yes, it seems likely the preprocessing is the issue. I downloaded the wav from here. Thanks for the quick replies!
Hey Mark,
I tested this with a longer wav. I broke it into 30 second chunks and combined the transcript output -- it cuts every 30 second chunk short. I'll do as much as I can to help solve this, but any help is much appreciated!
Hmm can I see some code and the exact audio file you are using, since this file (first on the list) transcribes ok.
My code:
gl_speech("OSR_us_000_0010_8k.wav", sampleRateHertz = 8000L)
Which produces:
$transcript
# A tibble: 10 x 2
transcript confidence
<chr> <chr>
1 the Birch canoes lid on the smooth planks 0.6725168
2 " glue the seat to the dark blue background." 0.70406663
3 " It is easy to tell the death of a well." 0.74633145
4 " These days of chicken leg as a word dish." 0.6776977
5 " Rice is often served in round bowls." 0.6781965
6 " Did use of lemon snakes find punch?" 0.7085607
7 " The box was down beside the park truck." 0.68749714
8 " the Hogs of the popcorn and garbage" 0.6751104
9 " 4 hours of study work face to us" 0.75836164
10 " a large size in stockings is hard to sell." 0.8373701
$timings
startTime endTime word
1 0.200s 0.700s the
2 0.700s 0.900s Birch
3 0.900s 1.500s canoes
4 1.500s 1.900s lid
5 1.900s 2s on
6 2s 2.200s the
7 2.200s 2.500s smooth
8 2.500s 3s planks
9 3.900s 4.500s glue
10 4.500s 4.800s the
11 4.800s 5s seat
12 5s 5.200s to
13 5.200s 5.200s the
14 5.200s 5.600s dark
15 5.600s 5.800s blue
16 5.800s 6.400s background.
17 7.500s 8s It
18 8s 8.100s is
19 8.100s 8.400s easy
20 8.400s 8.400s to
21 8.400s 8.700s tell
22 8.700s 8.800s the
23 8.800s 9s death
24 9s 9.100s of
25 9.100s 9.400s a
26 9.400s 9.600s well.
27 10.600s 11.200s These
28 11.200s 11.200s days
29 11.200s 11.500s of
30 11.500s 11.800s chicken
31 11.800s 12s leg
32 12s 12.100s as
33 12.100s 12.200s a
34 12.200s 12.400s word
35 12.400s 12.700s dish.
36 14s 14.400s Rice
37 14.400s 14.700s is
38 14.700s 15s often
39 15s 15.100s served
40 15.100s 15.500s in
41 15.500s 15.800s round
42 15.800s 16.200s bowls.
43 17.100s 17.500s Did
44 17.500s 17.800s use
45 17.800s 17.900s of
46 17.900s 18.300s lemon
47 18.300s 18.400s snakes
48 18.400s 19s find
49 19s 19.300s punch?
50 20.200s 20.600s The
51 20.600s 20.900s box
52 20.900s 21s was
53 21s 21.300s down
54 21.300s 21.700s beside
55 21.700s 21.800s the
56 21.800s 22.100s park
57 22.100s 22.300s truck.
58 23.500s 23.900s the
59 23.900s 24.100s Hogs
60 24.100s 24.300s of
61 24.300s 24.500s the
62 24.500s 25s popcorn
63 25s 25.300s and
64 25.300s 25.700s garbage
65 26.900s 27.300s 4
66 27.300s 27.700s hours
67 27.700s 27.800s of
68 27.800s 28s study
69 28s 28.200s work
70 28.200s 28.500s face
71 28.500s 28.700s to
72 28.700s 28.800s us
73 29.800s 30.100s a
74 30.100s 30.400s large
75 30.400s 30.500s size
76 30.500s 30.900s in
77 30.900s 31s stockings
78 31s 31.400s is
79 31.400s 31.600s hard
80 31.600s 31.900s to
81 31.900s 32.100s sell.
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] googleLanguageR_0.1.1.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 rstudioapi_0.7 xml2_1.2.0
[4] magrittr_1.5 hms_0.4.2 progress_1.1.2.9002
[7] R6_2.2.2 rlang_0.2.0 stringr_1.3.0
[10] httr_1.3.1 tools_3.4.3 utf8_1.1.3
[13] cli_1.0.0.9002 withr_2.1.1 selectr_0.3-2
[16] googleAuthR_0.6.2 openssl_0.9.9 yaml_2.1.18
[19] assertthat_0.2.0 digest_0.6.15 tibble_1.4.2
[22] crayon_1.3.4 zip_1.0.0 purrr_0.2.4
[25] base64enc_0.1-3 curl_3.1 memoise_1.1.0
[28] glue_1.2.0 stringi_1.1.7 compiler_3.4.3
[31] pillar_1.2.1 prettyunits_1.0.2 ansistrings_1.0.0.9000
[34] jsonlite_1.5 pkgconfig_2.0.1
Sure. Seems to stop after the first line:
library(googleLanguageR)
library(tuneR)
gl_auth("speech2text-3f89d34ff4a9.json")
wav_header <- readWave("OSR_us_000_0010_8k.wav", header = TRUE)
transcript <- gl_speech("OSR_us_000_0010_8k.wav", sampleRateHertz = wav_header$sample.rate)
transcript$transcript
[1] "the Birch canoes lid on the smooth planks"
transcript$words
[[1]] startTime endTime word 1 0.200s 0.700s the 2 0.700s 0.900s Birch 3 0.900s 1.500s canoes 4 1.500s 1.900s lid 5 1.900s 2s on 6 2s 2.200s the 7 2.200s 2.500s smooth 8 2.500s 3s planks
sessionInfo()
R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] tuneR_1.3.2 googleLanguageR_0.0.0.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 lubridate_1.7.2 lattice_0.20-35 tidyr_0.8.0 class_7.3-14 assertthat_0.2.0
[7] digest_0.6.15 ipred_0.9-6 psych_1.7.8 foreach_1.4.4 R6_2.2.2 plyr_1.8.4
[13] signal_0.7-6 stats4_3.4.3 httr_1.3.1 ggplot2_2.2.1 pillar_1.2.0 rlang_0.2.0
[19] curl_3.1 lazyeval_0.2.1 caret_6.0-78 data.table_1.10.4-3 googleAuthR_0.6.2.9000 kernlab_0.9-25
[25] rpart_4.1-11 Matrix_1.2-12 splines_3.4.3 CVST_0.2-1 ddalpha_1.3.1.1 gower_0.1.2
[31] stringr_1.3.0 foreign_0.8-69 munsell_0.4.3 broom_0.4.3 compiler_3.4.3 pkgconfig_2.0.1
[37] base64enc_0.1-3 mnormt_1.5-5 dimRed_0.1.0 openssl_1.0 nnet_7.3-12 tidyselect_0.2.3
[43] tibble_1.4.2 prodlim_1.6.1 DRR_0.0.3 codetools_0.2-15 RcppRoll_0.2.2 dplyr_0.7.4
[49] withr_2.1.1 MASS_7.3-47 recipes_0.1.2 ModelMetrics_1.1.0 grid_3.4.3 nlme_3.1-131
[55] jsonlite_1.5 gtable_0.2.0 magrittr_1.5 scales_0.5.0 stringi_1.1.6 reshape2_1.4.3
[61] bindrcpp_0.2 timeDate_3043.102 robustbase_0.92-8 xgboost_0.6.4.1 lava_1.6 iterators_1.0.9
[67] tools_3.4.3 glue_1.2.0 DEoptimR_1.0-8 purrr_0.2.4 sfsmisc_1.1-1 parallel_3.4.3
[73] survival_2.41-3 yaml_2.1.16 colorspace_1.3-2 memoise_1.1.0 bindr_0.1
I think you are installing from the wrong GitHub repo :)
googleLanguageR_0.0.0.9000
This version is
googleLanguageR_0.1.1.9000
Run remotes::install_github("ropensci/googleLanguageR")
and not remotes::install_github("MarkEdmondson1234/googleLanguageR")
- I'll remove that one ASAP....
Perfect! I can't thank you enough for your help and your work on this awesome package.
Hi Mark,
Having a bit of a strange issue -- I can get gl_speech to run with the code below, but it seems to cut the transcript short.:
library(googleLanguageR) library(tuneR)
The resulting transcript is correct, but only represents the first ~6 seconds of the new wav file. I've listened to the new file, and it contains speech for at least 30 seconds. I can replicate this issue with a different wav file. Any insight? Thanks!