@jmtsuji I noticed that you use a Symlinking paradigm throughout the pipeline. For example:
# Conditional based on whether short read polishing was performed
rule pre_coverage_filter:
input:
"{sample}/polish/medaka/{sample}_consensus.fasta" if POLISH_WITH_SHORT_READS == False else "{sample}/polish/polca/{sample}_polca.fasta"
output:
temp("{sample}/polish/cov_filter/{sample}_pre_filtered.fasta")
run:
source_relpath = os.path.relpath(str(input),os.path.dirname(str(output)))
os.symlink(source_relpath,str(output))
This was quite troublesome when I made the temp() file path as Snakemake would start deleting files that the symlinks would point to. I have found that Snakemake doesn't like symlinking because it relies on the presence and absence of files.
Perhaps the rebuilding issues that came up above are related to the symlinking? For example, Snakemake is having trouble tracking what files are used by what rules. I'm not sure. I would bring up this symlinking issue later, but I wanted to put it here now to ensure you know about it.
Initial response (@jmtsuji )
I've also run into symlinking issues while editing the pipeline and agree, it would be better to streamline these parts, if possible. Reducing the number of steps in the analysis this way might help with consistent DAG construction. (Just for reference for the future: for the polish steps, I think it's necessary to either copy or symlink the files before running the polish rules in order for the rules to be compatible with inputs from multiple modules. This is not the case for other rules, I think, so symlinking steps can probably be streamlined in many other cases.) ...
From #132 @LeeBergstrand
Initial response (@jmtsuji )