microbial-pangenomes-lab / microGWAS

Bacterial GWAS analysis
https://microgwas.readthedocs.io
MIT License
32 stars 1 forks source link

File of files paradigm #17

Closed mgalardini closed 5 months ago

mgalardini commented 5 months ago

This PR make the pipeline able to handle an arbitrary large number of samples, since it avoids hitting the limit on the number of command line characters. It also allows the user to keep their input fasta and gff files wherever they like.

With this new changes, the input data.tsv file MUST contain two new columns (fasta and gff), which contain the relative (or better, absolute) path to the assemblies and annotations. The test/data.tsv file is a good example of how a file should look like.

mgalardini commented 5 months ago

Many thanks for testing this and adding the last commit!

I realized the other day that the rule that runs abritamr is not currently in the suggested snakemake command, and the rule needs to be changed to accommodate the new "file of files" paradigm. But should be an easy and quick fix, which can be done in a new branch