zhanghao-njmu / SCP

An end-to-end Single-Cell Pipeline designed to facilitate comprehensive analysis and exploration of single-cell data.
https://zhanghao-njmu.github.io/SCP/
GNU General Public License v3.0
351 stars 79 forks source link

Timeout was reached when accessing Ensembl archives #169

Open realzwu opened 11 months ago

realzwu commented 11 months ago

Hi Hao,

I am facing problems right now when I implementing SCP on biotrainee.vip. When I ran FeatureHeatmap using demo dataset, an error was reported as follows:

> ht <- FeatureHeatmap(srt = pancreas_sub, group.by = "CellType", features = DEGs$gene, 
+   feature_split = DEGs$group1, species = "Mus_musculus", db = c("GO_BP", "KEGG"), 
+   anno_terms = TRUE, feature_annotation = c("TF", "CSPA"), 
+   feature_annotation_palcolor = list(c("gold", "steelblue"), c("forestgreen")), height = 5, width = 4)

'magick' package is suggested to install to give better rasterization.

Set `ht_opt$message = FALSE` to turn off this message.
[2023-09-27 00:17:07] Start Enrichment
Workers: 8
Species: Mus_musculus
Loading cached db: GO_BP version:3.16.0 nterm:15992 created:2023-09-26 20:37:08
Loading cached db: KEGG version:Release 107.0+/09-21, Sep 23 nterm:351 created:2023-09-26 20:38:08
Convert ID types for the database: GO_BP
Connect to the Ensembl archives...
Using the 103 version of biomart...
Connecting to the biomart...
Searching the dataset mmusculus ...
Connecting to the dataset mmusculus_gene_ensembl ...
Converting the geneIDs...
22401 genes mapped with entrez_id                                                                                                                    
==============================
22401 genes mapped
6542 genes unmapped
==============================

Convert ID types for the database: KEGG
Connect to the Ensembl archives...
Using the 103 version of biomart...
Connecting to the biomart...
Searching the dataset mmusculus ...
Connecting to the dataset mmusculus_gene_ensembl ...
Timeout was reached: [feb2021.archive.ensembl.org:443] Operation timed out after 10000 milliseconds with 418796 bytes received
Get errors when connecting with Dataset(mmusculus_gene_ensembl)
Retrying...
Timeout was reached: [feb2021.archive.ensembl.org:443] Operation timed out after 10000 milliseconds with 274117 bytes received
Get errors when connecting with Dataset(mmusculus_gene_ensembl)
Retrying...
Timeout was reached: [feb2021.archive.ensembl.org:443] Operation timed out after 10000 milliseconds with 109728 bytes received
Get errors when connecting with Dataset(mmusculus_gene_ensembl)
Retrying...
Timeout was reached: [feb2021.archive.ensembl.org:443] Operation timed out after 10000 milliseconds with 221797 bytes received
Get errors when connecting with Dataset(mmusculus_gene_ensembl)
Retrying...
Timeout was reached: [feb2021.archive.ensembl.org:443] Operation timed out after 10000 milliseconds with 179730 bytes received
Get errors when connecting with Dataset(mmusculus_gene_ensembl)
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [feb2021.archive.ensembl.org:443] Operation timed out after 10000 milliseconds with 179730 bytes received

The connection seems quite unstable, and I have not collected mart yet after serveral trials. I wonder if we can alter the mirror (or archive) for accessing the data? Or any other ways to solve the problem. Thank you!

zhanghao-njmu commented 10 months ago

Certainly, you can.

Here, FeatureHeatmap will conduct enrichment analysis. If a local gene annotation database is not available, it will perform tasks such as online downloading of the database and online gene ID conversion via Ensembl.

I suggest using PrepareDB to get the database ready first. If any issues occur while connecting to Ensembl, you might want to consider adjusting the Ensembl_version (default is 103, but can be switched to other versions) and the mirror parameter (choices include 'www', 'uswest', 'useast', 'asia').