Closed semenko closed 8 years ago
It is because the files I use from GREAT do not provide correspondance between regions and genes, to trace this information, internally I added a fourth column in the bed data frame which is a combination of chr and positions:
chr1 100 200
chr1 100 200 chr1:100-200
and then I can retrieve this correspondance back from GREAT.
But the problem now is if you provide a self-defined background regions, since background regions do not have the fourth column, GREAT will give you an error.
I will try to contact developers of GREAT to see whether it is possible to get rid of this or if they can directly add position information in the gene-region association file.
Ah, cool, thanks! I'll temporarily pass the chr1:N-NN bit in bed_bg as well.
Thanks for the patch! It works!
Hi, Guys~ I've come across the same problem again. My input data for background is just a three-column data frame, let's say it as "bg1". When I tried
job <-submitGreatJob(bg1[c(1,3,4,5,6),],bg=bg1,species="hg19",bgChoice="data",rule="basalPlusExt",max_tries=90,version="3.0.0")
The same error returns. However, weirdly, when I used the bed file generated by
bed = circlize::generateRandomBed(nr = 1000, nc = 0)
,
and then
job <- submitGreatJob(bed[c(1,3,4,5,6),],bg=bed,species="hg19",bgChoice="data",rule="basalPlusExt",max_tries=90,version="3.0.0")
,
the job could be submitted successfully.
May I ask whether there is any more requirements, other than sth like
head(bg1)
chr start end 1 chr1 713200 713400 2 chr1 906400 906600 3 chr1 907000 907200 4 chr1 914400 914600
, for the background data?
OK. I finally figured out the problem by myself. It is the "overlapping regions will be merged" step that made my test file and background file different. I do not know whether it could be a bug for this program, but I'm still wondering why this step is required.
The merging of the input regions is just to decrease the redundancy of the input regions. To my understanding, keeping regions which are overlapped (e.g. 1000 same regions while not merging them) will make bias for the functional enrichment.
Regarding the background setting, I think the requirement is the input regions should be subsets of the background, which means for every region in gr
, there must be a region in background
which completely covers it. But I think I can check it in submitGreatJob()
before submitting to GREAT website.
Thank you very much for the reply! Now, I'm clear what you've concerned about the overlapping regions, but I am not sure whether GREAT can detect "for every region in gr, there must be a region in background which completely covers it", because GREAT requires every region in gr
must be "duplicated-exactly" found in the background. For example, I have chr 1 100 200
in gr
, but chr1 100 200; chr1 201 300
in background, which becomes chr1 100 300
after merging. Then, my chr1 100 200
cannot be found in background and error returns.
Yes you are right! I understood wrong for this point because I never use background for my analysis. Now I think I need to figure out a better way to deal with this kind of scenario. Because users' input can be all kinds (i.e. valid input or invalid input), I want this package to do pre-processing of the inputs before submitting to GREAT website in order to make users' life easier.
I think the worst scenario is, let's say chr1 200 300; chr1 250 400
as gr
and chr1 100 250; chr1 300 500; chr1 400 600
as bg
, which means gr
in not completely covered by bg
and there are overlaps inside both gr
and bg
, I think maybe we should convert gr
and bg
to:
for gr
:
chr1 200 250
chr1 300 400
for bg
:
chr1 100 200
chr1 200 250
chr1 300 400
chr1 400 600
What do you think?
Great sorry for late reply! The idea is good. However, for the region chr1 250 300
in your example, it exists in user's gr
but not in bg
. Thus, a warning information should be given out to the user to remind them of sth like "the foreground set is not a subset of the background set". While, your conversion made the "wrong" input "correct". Moreover, although I haven't test, I think the number of bg
regions is related with the hypergeometric test used in GREAT. Therefore, when you perform the step of merging, you are acutally reducing the users' input regions. I haven't thought about how these changes would affect the final results in detail, but in my opinion, the potential impacts should be addressed to the users.
Good! Thanks for your comments!
Ran into an interesting issue submitting a .bed with a background region.
However, this is definitely a subset of the input set -- and submitted manually online, GREAT completes successfully.
I can reproduce this with just the test code: