I am looking forward to using enrichTF package for a set of ~1000 enhancer regions that I have for putative TF enrichment analysis.
I have a custom set of background regions (for each enhancer I have 4 randomly sampled regions with similar GC content and same length i.e. 1000 bp). How can I use this custom set of background regions? Is it necessary that the number of regions in our positive set must be equal to the number of regions in the negative set as well?
EDIT 1
Also the regions in test dataset have start coordinate 2bp more than the regions available in the testregion.foreground.bed
What's the reason behind this and how it can be prevented?
EDIT 2
I also observed that the background set generated is usually from one chromosome which is not the way we want our background to be. I therefore edited the background.bed all.bed file by randomly sampling 1000 random region from the 4000 background region set that we produced. However when we run the enrichFindMotifsInRegions function, I get the following error
>>>>>>==========================================================
Step Name:pipe_FindMotifsInRegions
All Parameters for This Step:
|Input:
| inputRegionBed:
| "F:\processed\final_data\ccRes\hsap_hg38_loc_nfr\enrichTF-pipeline/Step_00_pipe_GenBackground/combined_end_newcoor.txt.allregion.bed"
|Output:
| outputRegionMotifBed:
| "F:\processed\final_data\ccRes\hsap_hg38_loc_nfr\enrichTF-pipeline/Step_02_pipe_FindMotifsInRegions/combined_end_newcoor.txt.region.motif.bed"
| outputMotifBed:
| "F:\processed\final_data\ccRes\hsap_hg38_loc_nfr\enrichTF-pipeline/Step_02_pipe_FindMotifsInRegions/combined_end_newcoor.txt.motif.bed"
|Other Parameters:
| motifRc:
| "integrate"
| pwmObj:
| An object of PWMatrixList
| genome:
| "hg19"
| threads:
| 2
__________________________________________
Begin to check if it is finished...
2021-01-05 22:07:26
New step. Start processing data:
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: trying to load regions beyond the boundaries of non-circular sequence "chr17"
I realised that the testgenome is not complete genome (that too hg19 not 38) and I tried using hg38. However I get the following error
Configure motifpwm ...
trying URL 'https://wzthu.github.io/enrich/refdata/hg38/all_motif_rmdup.gz'
Error in download.file(url = sprintf(urlplaceholder, genome), destfile = paste0(refFilePath, :
cannot open URL 'https://wzthu.github.io/enrich/refdata/hg38/all_motif_rmdup.gz'
In addition: Warning message:
In download.file(url = sprintf(urlplaceholder, genome), destfile = paste0(refFilePath, :
cannot open URL 'https://wzthu.github.io/enrich/refdata/hg38/all_motif_rmdup.gz': HTTP status was '404 Not Found'
Hi Authors!!
I am looking forward to using enrichTF package for a set of ~1000 enhancer regions that I have for putative TF enrichment analysis.
I have a custom set of background regions (for each enhancer I have 4 randomly sampled regions with similar GC content and same length i.e. 1000 bp). How can I use this custom set of background regions? Is it necessary that the number of regions in our positive set must be equal to the number of regions in the negative set as well?
EDIT 1
Also the regions in test dataset have start coordinate 2bp more than the regions available in the testregion.foreground.bed
What's the reason behind this and how it can be prevented?
EDIT 2
I also observed that the background set generated is usually from one chromosome which is not the way we want our background to be. I therefore edited the background.bed all.bed file by randomly sampling 1000 random region from the 4000 background region set that we produced. However when we run the enrichFindMotifsInRegions function, I get the following error
I realised that the testgenome is not complete genome (that too hg19 not 38) and I tried using hg38. However I get the following error
Kindly help