Closed aruginkgo closed 3 years ago
for what it's worth, I threw in
from pathlib import Path
os.makedirs(temp_dir / Path(seq_name).parent, exist_ok=True)
in make_mash_sketches
just after fasta_pos
and fasta_neg
and it finished the distance matrix part then crashed again at clustering with a similar issue:
cluster/cluster_001/1_contigs:
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/trycycler", line 11, in <module>
load_entry_point('Trycycler==0.4.1', 'console_scripts', 'trycycler')()
File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/__main__.py", line 40, in main
cluster(args)
File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/cluster.py", line 42, in cluster
cluster_numbers = complete_linkage(seqs, seq_names, depths, matrix, args.distance, args.out_dir)
File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/cluster.py", line 325, in complete_linkage
with open(seq_fasta, 'wt') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'cluster/cluster_001/1_contigs/A_assemblies/canu_0.fasta'
with 1_contigs
being created but empty
edit: with the hacky "fix" (big air quotes)
os.makedirs(cluster_dir / pathlib.Path(name).parent, exist_ok=True)
seq_fasta = cluster_dir / f'{pathlib.Path(name).stem}.fasta'
in cluster.py
line ~324 in the loop not crashing and creating the final cluster_001/1_contigs/*_0.fasta
but not sure why it's looking for the A_assemblies
directory to begin with.
trycycler reconcile
worked after that as well.
Thanks for spotting this bug! If I understand correctly, one of your input assemblies has a contig named assemblies/canu_0
. The slash is causing the problem, because the Trycycler cluster command saves contigs to a temporary file using their contig name as a filename. So it was trying to save /tmp/tmp6f7zakqj/A_assemblies/canu_0_pos.fasta
, but the /tmp/tmp6f7zakqj/A_assemblies/
directory didn't exist because it was trying to save a file named A_assemblies/canu_0_pos.fasta
.
I've taken the easy way out of this one and just made Trycycler check for slashes in contig names and quit with an error if they are there. That was easier than ensuring slash-containing contig names don't cause a crash :smile:
Also, thanks for pointing out the version number discrepancy! I've made a new version with the fix (v0.4.3), and now both GitHub and the code agree.
For some reason I get
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpz1qeqdno/A_assemblies/canu_0_pos.fasta'
when runningtrycycler cluster
during the distance matrix part. It seems like the temp directory is being made but not theA_assemblies
directory inside that.I think I am using the latest version of Trycycler (that is to say, I
python3 setup.py install
'd in a directory calledTrycycler-0.4.2
but theversion.py
in that is still 0.4.1)I was able to Trycycle a different set of assemblies so it might be something on my end. I can't share the sequences unfortunately but I can try to see if I can get a reproducible example going.