Open philliplab opened 10 years ago
Report the frequency as a percentage rounded to 2 decimal places
“unique number” “total sequences””haplotype percentace”
Change was made. Testing needed.
Can either use the instance hosted at SANBI or just update your local version by running install_github('philliplab/ViralHaplotyper') again.
Two comments: The frequencies have two types, float and non float Ie:..._0.767 _0.184 _00000 this causes the sorting by frequency to be a bit out of order for some sequences
The output is as a frequency vs a percentage. what are your thoughts on this? I was thinking percentage, to one decimal place (which may solve the issue above?)
I switched it to decimals. Please test again.
This seems to be working nicely. There is a small issue with the sorting though. on the web interface, the sort works perfectly. However, when downloading the sequence after sorting, some sequences are not ranked correctly. ..._001_01089_078.1 ..._002_00224_016.1 ..._003_00002_000.1 ..._004_00001_000.1 ..._005_00013_000.9 ... ... ..._026_00003_000.2
What is the number 1 for that gets assigned to all sequences? CAP177_2030V1_1_001_01089_078.1
The sorting and the downloading are not linked to each other. It would be a significant amount of work to make them work together.
The number 1 is the haplotype number for the data set. The next number is the number of the unique sequence in the haplotype. So currently the system works by constructing a single 'haplotype' and reporting on all its unique sequences. I should probably allow the haplotype and or sequence number to be suppressed. Note that the way the term haplotype is used in the code is no longer consistent with it's biological meaning. See issue #34
Or maybe the downloading and sorting can be very easy.
Left pad the counts so that they are all of similar length
Replace the total number of sequences in the haplotype with the frequency of the of the particular sequence