Open asierFernandezP opened 4 days ago
Hi Asier, Thanks for using SynTracker!
I’ll start the second issue – yes you are correct. Most viruses are not abundant enough to be compared across samples. Probably these less abundant viruses were only detected in one (or less) samples.
As for your first issue – not getting any of the “avg_synteny_scores[subsampling_length]_regions.csv” files: the reason for that is that the total length, even with the lowest subsampling value (i.e., number of regions), is still longer than your viral genomes. Therefore, even if a genome is detected and compared, it is excluded from the final output, as the number of available regions is still too low. In those cases, when analyzing very short genomes, the[--avg_all] flag should be used. Then, the tool will generate another type of final per-genome files, based on all available regions. 12 in the example you posted. Thanks for directing me to that, We will make this option a default.
If running the tool again will take too much time, please let me know and I'll help with an alternative solution. Cheers, Hagay
Thank you Hagay!
I will run then adding the -avg_all
option. Regarding the time, as I have quite a lot of metagenomic samples and viral genomes, it takes indeed a long time to run. Would it be possible to run it in batches (e.g. instead of 1000 viral genomes against all metagenomes, compare batches of 100 genomes vs all metagenomes to speed up the process, and then merge the results?). As far as I understand, no comparisons are made between different viral genomes so this should not affect the final results. Am I correct?
Dear all,
First of all thanks for developing this tool. I am currently trying to run it using a set of reference viral genomes (identified from gut metagenomes) and the metagenomic contigs from multiple samples (as the target genomes).
The command that I used is the following:
The example output in the log file for one of the genomes is:
The tool seems to run without any problems but:
There seems to be a problem when computaing the average scores (although no error is reported). Would you recommend changing/adding any parameters?
In this case, as a test, I used 100 reference viral genomes (and ~400 assemblies) and only 12 of the genome output folders have actually computed results (I guess most of the viruses are simply not present in any sample or only in 1 of them and no comparison is possible - is that correct?)
Best, Asier