rvalieris / LCS

9 stars 4 forks source link

Question about existing marker table and creating one #4

Closed aharring83 closed 2 years ago

aharring83 commented 2 years ago

Hello, I am confused about creating a new marker table for running your software. I am looking at the provided pango-designation-markers-v1.2.60.tsv file and I am confused about the last 3 columns (adref, adalt, dp). What do these columns mean? Lets say that I have my own variant table and I would like to add to your existing pango-designation-markers table, would that be possible? Based on your instructions, for generating a new table using the pango-designation, we would need to have the fasta files for all isolates listed in the pango-designation lineage.csv file? That is over 50K genomes.

Your response will be appreciated, thanks.

rvalieris commented 2 years ago

hello,

(adref, adalt, dp). What do these columns mean?

adref: number of genomes with the reference allele adalt: number of genomes wth the mutated allele dp: total coverage of the position

these are used to calculate the approximate allele frequency of the mutation on a given variant, note that if you add a mutation for one variant, you must add corresponding lines for all other variants as well.

Lets say that I have my own variant table and I would like to add to your existing pango-designation-markers table, would that be possible?

if you have the same columns as expected by the software, then yes it is possible.

Based on your instructions, for generating a new table using the pango-designation, we would need to have the fasta files for all isolates listed in the pango-designation lineage.csv file? That is over 50K genomes.

there are 2 ways of generating a new table, with the pango-designation list you need a fasta file with all genomes yes, we are looking into ways of making this process easier, but for now you need all genomes.

the other way, is using the UCSC prebuilt trees, this process is much faster and the results are very similar.