ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
111 stars 42 forks source link

issues with version 5.0.0 #291

Closed guohuansu closed 1 week ago

guohuansu commented 1 month ago

Hi, I updated the version to 5.0.0 today. And then every function doesn't work. When I ran a function the error shows as bellow:

Error in open.connection(con, "rb") : cannot open the connection to 'https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb' In addition: Warning message: In open.connection(con, "rb") : URL 'https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb': Timeout of 60 seconds was reached

I tried to re-install the package several times and switch on and off the VPN, but nothing changed.

Does any one can help me to solve the problem? Thanks in advance!

andybeet commented 1 month ago

Can you provide a reproducible example using reprex please? Can you also provide your session info? It's hard to determine whats going on without sufficient examples. Eg.

packageVersion("rfishbase")
#> [1] '5.0.0'

Created on 2024-10-11 with reprex v2.1.0

For example, for function issue: reprex::reprex(rfishbase::common_names("Gadus morhua"))

rfishbase::common_names("Gadus morhua")
#> Joining with `by = join_by(Subfamily, GenCode, FamCode)`
#> Joining with `by = join_by(FamCode)`
#> Joining with `by = join_by(Order, Ordnum, Class, ClassNum)`
#> Joining with `by = join_by(Class, ClassNum)`
#> # A tibble: 124 × 4
#>    Species      ComName      Language SpecCode
#>    <chr>        <chr>        <chr>       <int>
#>  1 Gadus morhua Atlantic cod English        69
#>  2 Gadus morhua Bacalao      English        69
#>  3 Gadus morhua Bacaleau     English        69
#>  4 Gadus morhua Baccalao     English        69
#>  5 Gadus morhua Baccale      English        69
#>  6 Gadus morhua Baccalo      English        69
#>  7 Gadus morhua Bank cod     English        69
#>  8 Gadus morhua Bank fish    English        69
#>  9 Gadus morhua Bastard      English        69
#> 10 Gadus morhua Berry fish   English        69
#> # ℹ 114 more rows

Created on 2024-10-11 with reprex v2.1.0

for session information: reprex::reprex(sessionInfo())

sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.29     withr_3.0.0       R.methodsS3_1.8.2 lifecycle_1.0.4  
#>  [5] magrittr_2.0.3    reprex_2.1.0      evaluate_0.23     rlang_1.1.3      
#>  [9] cli_3.6.2         rstudioapi_0.16.0 fs_1.6.3          R.utils_2.12.3   
#> [13] R.oo_1.26.0       vctrs_0.6.5       styler_1.10.3     rmarkdown_2.27   
#> [17] tools_4.2.0       R.cache_0.16.0    glue_1.6.2        purrr_1.0.2      
#> [21] xfun_0.42         yaml_2.3.5        fastmap_1.1.1     compiler_4.2.0   
#> [25] htmltools_0.5.8.1 knitr_1.45

Created on 2024-10-11 with reprex v2.1.0

guohuansu commented 1 month ago

Hi, thank you for your reply, here are the examples: reprex::reprex(packageVersion("rfishbase"))

packageVersion("rfishbase")
#> [1] '5.0.0'

Created on 2024-10-12 with reprex v2.1.0

reprex::reprex(rfishbase::common_names("Gadus morhua"))

rfishbase::common_names("Gadus morhua")
#> Warning in open.connection(con, "rb"): URL
#> 'https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb':
#> Timeout of 60 seconds was reached
#> Error in open.connection(con, "rb"): cannot open the connection to 'https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb'

Created on 2024-10-12 with reprex v2.1.0 reprex::reprex(sessionInfo())

sessionInfo()
#> R version 4.4.0 (2024-04-24 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 22631)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.utf8 
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8   
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C                               
#> [5] LC_TIME=Chinese (Simplified)_China.utf8    
#> 
#> time zone: Asia/Shanghai
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.35     fastmap_1.2.0     xfun_0.44         glue_1.8.0       
#>  [5] knitr_1.46        htmltools_0.5.8.1 rmarkdown_2.27    lifecycle_1.0.4  
#>  [9] cli_3.6.3         reprex_2.1.0      withr_3.0.1       compiler_4.4.0   
#> [13] rstudioapi_0.16.0 tools_4.4.0       evaluate_0.23     yaml_2.3.8       
#> [17] rlang_1.1.4       fs_1.6.4

Created on 2024-10-12 with reprex v2.1.0

cboettig commented 1 month ago

@guohuansu thanks for the report. Can you see if you can open that link (https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb) (a) in your browser, and (b), from R, e.g.

httr::GET("https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb")

I'm wondering if we have a firewall issue rather than a package issue.

Also minor thing but in building the reprex it's nice if you use explicitly library calls so that we see rfishbase showing up in the sessionInfo()

guohuansu commented 1 month ago

Yes, I can see 32 lines of code after opening the link via the browser. But I can't open it from R using your code, the error is shown below:

httr::GET("https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb")
#> Error in curl::curl_fetch_memory(url, handle = handle): Timeout was reached: [huggingface.co] Failed to connect to huggingface.co port 443 after 10001 ms: Timeout was reached

I used library(rfishbase) and info. about it showed in sessionInfo() as below:

other attached packages:
[1] rfishbase_5.0.0
cboettig commented 1 month ago

@guohuansu what 32 lines do you see in the browser?? You should be seeing only a small JSON blob showing the 5 releases (probably as a single line of code).

[{"type":"directory","oid":"34a1366f434dd9947de4288f208bea23b706db5f","size":0,"path":"data/fb/v19.04"},{"type":"directory","oid":"79beffc6a394f1de30e6e8172f1d7dbcb36d1fd8","size":0,"path":"data/fb/v21.06"},{"type":"directory","oid":"fd595018a45d57981999a6c4b45fdbc388f72b20","size":0,"path":"data/fb/v23.01"},{"type":"directory","oid":"05e98477e2ec0fb8aa6d90846ca9105f28430809","size":0,"path":"data/fb/v23.05"},{"type":"directory","oid":"5933af7d51bb10fed47beb735ff09ee7ba7df0e2","size":0,"path":"data/fb/v24.07"}]

If GET is failing, this is unfortunately not an issue with rfishbase but with your R installation's libcurl bindings.

Let's also test outside of R in the terminal. What do you get with:

curl -L "https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb"

Should be the same json as above. If not, let's try with verbose mode and see if we can debug:

curl -vv -L "https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb"
Laura61616 commented 1 week ago

I have exactly the same issue that I can access the "https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb" in my browser, but not in R or the ternimal.

So I just rolling back to the older version of 'rfishbase' and it's solved. Here's some info that may help:

packageVersion("rfishbase")
#[1] ‘3.1.9’
rfishbase::common_names("Gadus morhua")
#Importing C:\Users\15850\AppData\Roaming/R/data/R/rfishbase/comnames_fb_2104.tsv.bz2 in 1000000 line chunks:
#Rows: 324211 Columns: 35                                                                                                                        
-- Column specification -------------------------------------------------------------------------------------
Delimiter: "\t"
chr (34): autoctr, ComName, Transliteration, StockCode, SpecCode, C_Code, Language, Script, UnicodeText, ...
dbl  (1): ComNamesRefNo

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
    ...Done! (in 30.69885 secs)
# A tibble: 124 x 4
   SpecCode Species      ComName      Language
   <chr>    <chr>        <chr>        <chr>   
 1 69       Gadus morhua Atlantic cod English 
 2 69       Gadus morhua Bacalao      English 
 3 69       Gadus morhua Bacaleau     English 
 4 69       Gadus morhua Baccalao     English 
 5 69       Gadus morhua Baccale      English 
 6 69       Gadus morhua Baccalo      English 
 7 69       Gadus morhua Bank cod     English 
 8 69       Gadus morhua Bank fish    English 
 9 69       Gadus morhua Bastard      English 
10 69       Gadus morhua Berry fish   English 
# i 114 more rows
# i Use `print(n = ...)` to see more rows
guohuansu commented 1 week ago

@cboettig Hi, I know why this issue happened to me and all users in China like @Laura61616 and my other collegues. Because of the firewall, we couldn't access to the huggingface website directly. Although I can set vpn to access the website via browser, it can't apply to the R envrionment. I tried to set vpn from R seperately, but failed. Then I found a huggingface mirror site (https://github.com/padeoe/hf-mirror-site, https://hf-mirror.com/), which may solve this issue faced by China located users. I've tried to download the functions from rfishbase packge and changed the code hf <- "https://huggingface.co" to hf <- "https://hf-mirror.com" Then it works well. So I wonder if you could provide an option for the users to choose which link to use, or set condtion when the first link doesn't work, it goes to the mirror site. Thank you so much!

cboettig commented 1 week ago

@guohuansu Thank you very much for tracking this down! Would you be interested in sending a PR to add this option?

guohuansu commented 1 week ago

@cboettig Yes, I've tried to pull a request, please check whether it can work.