Closed TJrogers86 closed 1 month ago
Hello, thank you for your suggestions!
I have made splitting the sequences on N repeats optional, and added a flag --remove-n-repeats if anyone wants to remove the N repeats and split the genomes based on them. This is available from v1.3.1.
I am looking into your suggestion and have made a note for a future update, thanks for the feedback!
FYI we are also pushing out v1.4.0 which replaces Ray with a custom made multiprocessing and distributed processing library that loads faster and works better with MetaCerberus.
Thank you! -Jose
Thanks!
Hello, Thanks for the great program! I only have one minor complaint and one possible suggestion. The minor complaint: I noticed that metacerberus looks for N repeats and removes them before it annotates. The issue is I would like to use the .gff output to make a gene map of the viral contigs that I used as input into metacerberus. By removing the N repeats, my viral contigs are being fragmented into smaller contigs and given a number at the end of the name. When using the gff file to make gene maps with the gggenes R package, this causes the fragments to be plotted individually. For example, lets say I have a viral contig named vContig_000000000014||full. After the N repeats are removed, I am left with 3 individual contigs with varying lengths: vContig_000000000014||full_1 (4 kb long), vContig_000000000014||full_2 (30 kb + long), and vContig_000000000014||full_3 (12 kb long). When i go to plot these with gggenes each is ploted on its own (see fig below for example). What I would like to be able to do is just have one gene map of the full contig so that the original bp start and end points are preserved for all genes. Not sure if there is a possible fix for this or not.
As for the suggestion: Would it be possible to have metacerberus create a data frame out put that has all the genes for each contig listed and a column that says if that gene is a viral Auxillary Metabolic Gene if the original inputs were viral in origin? Just a thought.