merenlab / oligotyping

Exploring microbial patterns through subtle nucleotide variation within 16S rRNA gene tag sequences of closely related taxa
GNU General Public License v2.0
40 stars 22 forks source link

Purity score #26

Open jackmen opened 8 years ago

jackmen commented 8 years ago

Hi A. Murat,

I tried to find out how the purity score implemented by Doğancan Özturan is calculated, but so far I did not come across the explanation. Can you please shed some light!

Thanks a lot again!

ozturan commented 8 years ago

Hi @jackmen,

In every oligotype unique sequence frequency, the score chooses the highest one and looks for its dominancy across oligotype unique sequences. Therefore, when one oligotype unique sequence is abundant enough to be able present itself, the seperation completes itself and generate the dominancy score with the second most abundant oligotype unique sequence frequence. For each oligotype this rule of score applies and for the Total Purity score the module takes last quartile of the all unique sequences of oligotypes. Therefore the score does not skip the bad score in order to generate a nice purity score.

Take care, Dogancan

jackmen commented 8 years ago

Dear Ozturan,

thanks a lot for your quick reply. I got it!

The best and happy researching!