seb-mueller / chlamy_locus_map

Small RNA Locus Map for Chlamydomonas reinhardtii
GNU General Public License v3.0
1 stars 0 forks source link

parameters #9

Open seb-mueller opened 6 years ago

seb-mueller commented 6 years ago

I've marked the parameters we used in arabidopsis:

The gap parameter here seems non functional! https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L80

Did we do this? https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L90

https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L104

https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L109

https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L117-L118

https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L132-L133

https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/allInOne.R#L143-L144

https://github.com/seb-mueller/chlamy_locus_map/blob/facef6baf54736c115eaf3a3f869d29f9a0c7eda/Archive/arabidopsis/construct_annotation.R#L25

seb-mueller commented 5 years ago

Our current favorite segmentation parameter setting run with this script: https://github.com/seb-mueller/chlamy_locus_map/blob/master/Scripts/chlamy_segmentation_gap100.R

was using:

short for multi200_gap100. This gives a total of 13739 loci in the 'segD' object, i.e. before filtering:

inputdata <- "segD_chlamy_segmentation_multi200_gap100.RData"
nrow(segD)                                                                                        
[1] 13739

Next question: How do we set the parameter for filtering?:

https://github.com/seb-mueller/chlamy_locus_map/blob/893a68064902ba09850fef0324ee1322127908bb/Scripts/chlamy_annotation_pipeline.R#L53

Currently it is set to

FDR = 0.05
perReplicate = FALSE

which gives 8320 loci. Usually perReplicate was TRUE giving 4915 loci but also gave a warning (see code above). Any idea? What would you say looking at the diagnostic plots?

nmatthews323 commented 5 years ago

So from the SegmentSeq documentation:

perReplicate
If TRUE, selection of loci is done on a replicate by replicate basis. If FALSE, selection will be done on the likelihood that the locus represents a true locus in at least one replicate group.

I think we should definitely change it to perReplicate=TRUE, not sure why it was FALSE if I'm honest. Arabidopsis script uses TRUE, and look at the diagnostic plots for TRUE vs FALSE.

seb-mueller commented 5 years ago

That was actually me, since the 'TRUE' setting gave warnings I switched to 'FALSE':

> loci <- selectLoci(cD = segD, FDR = 0.05, perReplicate = TRUE)
Warning messages:
1: In max(which(cumsum((1 - sort(likes, decreasing = TRUE))/1:length(likes)) <  :
  no non-missing arguments to max; returning -Inf
2: In max(which(cumsum((1 - sort(likes, decreasing = TRUE))/1:length(likes)) <  :
  no non-missing arguments to max; returning -Inf
> nrow(loci)
[1] 4915
> loci <- selectLoci(cD = segD, FDR = 0.1, perReplicate = TRUE)
Warning message:
In max(which(cumsum((1 - sort(likes, decreasing = TRUE))/1:length(likes)) <  :
  no non-missing arguments to max; returning -Inf
> nrow(loci)
[1] 6439

But apparently it worked anyway, we could go back to TRUE and ignore the warnings..

On FDR = 0.05 we only get 4915 loci, is this about how many you got before? Should be go with this then?

nmatthews323 commented 5 years ago

Ahhhh that makes more sense. Not sure I understand what that warning is about without having a close work through the code.

I think before we used FDR 0.1 and some different inputs, so had 20000 loci I believe, but we know that had problems. I'd rather have 5000 good loci than 20000 bad loci.

I'm personally happy with TRUE setting and FDR of 0.05.