wzthu / enrichTF

Bioconductor package enrichTF: Transcription Factors Enrichment Analysis
https://www.bioconductor.org/packages/release/bioc/html/enrichTF.html
2 stars 4 forks source link

How to give custom set of background region #13

Open Rohit-Satyam opened 3 years ago

Rohit-Satyam commented 3 years ago

Hi Authors!!

I am looking forward to using enrichTF package for a set of ~1000 enhancer regions that I have for putative TF enrichment analysis.

I have a custom set of background regions (for each enhancer I have 4 randomly sampled regions with similar GC content and same length i.e. 1000 bp). How can I use this custom set of background regions? Is it necessary that the number of regions in our positive set must be equal to the number of regions in the negative set as well?

EDIT 1

Also the regions in test dataset have start coordinate 2bp more than the regions available in the testregion.foreground.bed

[1]     chr1   1690154-1691153      *
  [2]     chr1   1784734-1785733      *
  [3]     chr1   9797247-9798246      *
  [4]     chr1   9828095-9829094      *
  [5]     chr1 12203230-12204229      *
  [6]     chr1 15063575-15064574      *
  [7]     chr1 15266160-15267159      *
  [8]     chr1 15294621-15295620      *
  [9]     chr1 15406202-15407201      *
chr1    1690152 1691153 1   0   .
chr1    1784732 1785733 2   0   .
chr1    9797245 9798246 3   0   .
chr1    9828093 9829094 4   0   .
chr1    12203228    12204229    5   0   .
chr1    15063573    15064574    6   0   .
chr1    15266158    15267159    7   0   .
chr1    15294619    15295620    8   0   .
chr1    15406200    15407201    9   0   .

What's the reason behind this and how it can be prevented?

EDIT 2

I also observed that the background set generated is usually from one chromosome which is not the way we want our background to be. I therefore edited the background.bed all.bed file by randomly sampling 1000 random region from the 4000 background region set that we produced. However when we run the enrichFindMotifsInRegions function, I get the following error

>>>>>>==========================================================
Step Name:pipe_FindMotifsInRegions
All Parameters for This Step:
|Input:
|    inputRegionBed:
|        "F:\processed\final_data\ccRes\hsap_hg38_loc_nfr\enrichTF-pipeline/Step_00_pipe_GenBackground/combined_end_newcoor.txt.allregion.bed"
|Output:
|    outputRegionMotifBed:
|        "F:\processed\final_data\ccRes\hsap_hg38_loc_nfr\enrichTF-pipeline/Step_02_pipe_FindMotifsInRegions/combined_end_newcoor.txt.region.motif.bed"
|    outputMotifBed:
|        "F:\processed\final_data\ccRes\hsap_hg38_loc_nfr\enrichTF-pipeline/Step_02_pipe_FindMotifsInRegions/combined_end_newcoor.txt.motif.bed"
|Other Parameters:
|    motifRc:
|        "integrate"
|    pwmObj:
|        An object of PWMatrixList
|    genome:
|        "hg19"
|    threads:
|        2
__________________________________________
Begin to check if it is finished...
2021-01-05 22:07:26
New step. Start processing data: 
Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: trying to load regions beyond the boundaries of non-circular sequence "chr17"

I realised that the testgenome is not complete genome (that too hg19 not 38) and I tried using hg38. However I get the following error

Configure motifpwm ...
trying URL 'https://wzthu.github.io/enrich/refdata/hg38/all_motif_rmdup.gz'
Error in download.file(url = sprintf(urlplaceholder, genome), destfile = paste0(refFilePath,  : 
  cannot open URL 'https://wzthu.github.io/enrich/refdata/hg38/all_motif_rmdup.gz'
In addition: Warning message:
In download.file(url = sprintf(urlplaceholder, genome), destfile = paste0(refFilePath,  :
  cannot open URL 'https://wzthu.github.io/enrich/refdata/hg38/all_motif_rmdup.gz': HTTP status was '404 Not Found'

Kindly help