Open dugala239 opened 2 months ago
Seconding this request. I plan to use it to run RAxML. I have attempted to use the resulting concat_allele.aln.fasta of maast tree command, but somehow RAxML only recognizes half of total SNPs listed in vcf file. Or maybe you have a way to tweak something when running RAxML using the concat_allele.aln.fasta?
Thank you very much!
Hi thanks for using Maast! I think concat_allele.aln.fasta should be good for RAxML or iqtree with little or minor changes. Can you possibly share your file? I am happy to take a look to see whether there would be a simple workaround.
Hi!
the command I used for RAxML is: raxmlHPC-PTHREADS -s ${DIRR}concat_allele.aln.fasta -f a -m GTRGAMMA -p 12345 -x 12345 -N 1000 -n Xoo -T 50
I have attached the logfile (job is still running as of now) JobName.3693.txt attached the logfile (job is still running as of now)
I have about ~51,100 SNPs based on core_snps.vcf but RAxML only recognizes half of them
Ok I see. Could you please verify the length of each concatenated allele sequences in concat_allele.aln.fasta
? SNPs in core_snps.vcf are not necessarily all ended up in the MSA files due to several factors: bi-allelic nor not, covered by a good k-mer or not, prevalence of the site in the population, etc. I
based on seqkit stats, all entries in the concat_allele.aln.fasta have 48,920 bases.
What about the invariant sites (i.e. sites has the same allele across all genomes) in the MSA? Would it be possibly due to automatic removal of these sites by RAxML?
I assume invariant site was already removed through maast tree command? I made all arguments in default. I will also try to run the aligned fasta in IQ-Tree, see if they would differ in terms of the number of distinct patterns identified.
Yes you are right. Maast will remove sites below min MAF(Minor Allele Frequency) and min MAC(Minor Allele Count). Please let me know how it goes with IQ-tree. At the same time I will look into it on my end. Thanks.
Hey,
We only get a vcf file for tag genomes and SNP files for each genome in gt_results. Do you have any suggestions for making a SNP matrix of all genome? Thanks!