Open seb-mueller opened 6 years ago
Our current favorite segmentation parameter setting run with this script: https://github.com/seb-mueller/chlamy_locus_map/blob/master/Scripts/chlamy_segmentation_gap100.R
was using:
short for multi200_gap100
.
This gives a total of 13739 loci in the 'segD' object, i.e. before filtering:
inputdata <- "segD_chlamy_segmentation_multi200_gap100.RData"
nrow(segD)
[1] 13739
Currently it is set to
FDR = 0.05
perReplicate = FALSE
which gives 8320 loci. Usually perReplicate was TRUE giving 4915 loci but also gave a warning (see code above). Any idea? What would you say looking at the diagnostic plots?
So from the SegmentSeq documentation:
perReplicate
If TRUE, selection of loci is done on a replicate by replicate basis. If FALSE, selection will be done on the likelihood that the locus represents a true locus in at least one replicate group.
I think we should definitely change it to perReplicate=TRUE, not sure why it was FALSE if I'm honest. Arabidopsis script uses TRUE, and look at the diagnostic plots for TRUE vs FALSE.
That was actually me, since the 'TRUE' setting gave warnings I switched to 'FALSE':
> loci <- selectLoci(cD = segD, FDR = 0.05, perReplicate = TRUE)
Warning messages:
1: In max(which(cumsum((1 - sort(likes, decreasing = TRUE))/1:length(likes)) < :
no non-missing arguments to max; returning -Inf
2: In max(which(cumsum((1 - sort(likes, decreasing = TRUE))/1:length(likes)) < :
no non-missing arguments to max; returning -Inf
> nrow(loci)
[1] 4915
> loci <- selectLoci(cD = segD, FDR = 0.1, perReplicate = TRUE)
Warning message:
In max(which(cumsum((1 - sort(likes, decreasing = TRUE))/1:length(likes)) < :
no non-missing arguments to max; returning -Inf
> nrow(loci)
[1] 6439
But apparently it worked anyway, we could go back to TRUE and ignore the warnings..
On FDR = 0.05
we only get 4915 loci, is this about how many you got before? Should be go with this then?
Ahhhh that makes more sense. Not sure I understand what that warning is about without having a close work through the code.
I think before we used FDR 0.1 and some different inputs, so had 20000 loci I believe, but we know that had problems. I'd rather have 5000 good loci than 20000 bad loci.
I'm personally happy with TRUE setting and FDR of 0.05.
I've marked the parameters we used in arabidopsis:
The gap parameter here seems non functional! https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L80
Did we do this? https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L90
https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L104
https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L109
https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L117-L118
https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L132-L133
https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L143-L144
https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/construct_annotation.R#L25