Re-run with subset of samples

shlomobl commented 2 years ago

Hi,

Is there a way to run part of the pipeline on a subset of the initial genomes, without running it all over again?

For example, if there are some "outliers" that I want to remove from the final analysis/report/trees, and run only the core-SNP etc. parts?

Thanks!

nmquijada commented 2 years ago

Hi @shlomobl

I see... You can definitely do that but it requires some expertise by dealing with the rmarkdown files generated by tormes (if your aim is that those samples not to appear in the interactive report). In any case, some analysis must need to be re-done (such as the trees) as they are widely affected by each sample and samples cannot be simply removed form the output.

Maybe the easiest way now is to remove those samples from the metadata file and to speed up the tormes run by:

making all samples as GENOME in such files (as you have already the assemblies if this would be your second run)
use the option only_gene_prediction to skip annotation (as you might already have the annotation results if you need them)

As I told you here https://github.com/nmquijada/tormes/issues/56, we are developing a major release of the pipeline and this issue is one of the improvements we are willing for, as more users requested similar features.

Sorry for not being more helpful at this time! I will update you as soon as we release the new version!

Best, Narciso

shlomobl commented 2 years ago

Hi, Good news (major release)! I will try with both options you suggested. Thanks! S.

On Fri, Jun 17, 2022 at 10:40 AM Narciso Martin Quijada < @.***> wrote:

Hi @shlomobl https://github.com/shlomobl

I see... You can definitely do that but it requires some expertise by dealing with the rmarkdown files generated by tormes (if your aim is that those samples not to appear in the interactive report). In any case, some analysis must need to be re-done (such as the trees) as they are widely affected by each sample and samples cannot be simply removed form the output.

Maybe the easiest way now is to remove those samples from the metadata file and to speed up the tormes run by:

making all samples as GENOME in such files (as you have already the assemblies if this would be your second run)

use the option only_gene_prediction to skip annotation (as you might already have the annotation results if you need them)

As I told you here #56 https://github.com/nmquijada/tormes/issues/56, we are developing a major release of the pipeline and this issue is one of the improvements we are willing for, as more users requested similar features.

Sorry for not being more helpful at this time! I will update you as soon as we release the new version!

Best, Narciso

— Reply to this email directly, view it on GitHub https://github.com/nmquijada/tormes/issues/54#issuecomment-1158588355, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKL3XN6DPSDC5QIHFVMPALVPQTXNANCNFSM5YP4OXGQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Dr. Shlomo Blum, DVM PhD

KSVM Bacteriology and Mycology Lecturer Head of Dept. of Bacteriology and Mycology Kimron Veterinary Institute POB 12 Bet Dagan, 50250 Israel Tel.: +972-3-9681680 Mob.: +972-50-6241862

nmquijada / tormes

Re-run with subset of samples #54