rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

ValueError when creating the Newick file #21

Closed hkaspersen closed 2 years ago

hkaspersen commented 3 years ago

Hello Ryan, thank you for creating this awesome tool!

I installed Trycycler version 0.5.0 through conda on the HPC cluster we are working on here. I am testing the workflow now, and I encountered this error when I ran trycycler cluster: (I have shortened the paths to make it a bit easier to read)

Traceback (most recent call last):
  File "src/miniconda/envs/trycycler/bin/trycycler", line 10, in <module>
    sys.exit(main())
  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/__main__.py", li
ne 41, in main
    cluster(args)
  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/cluster.py", lin
e 43, in cluster
    build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers)
  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/cluster.py", lin
e 262, in build_tree
    tree_script, newick = create_tree_script(temp_dir, phylip)
  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/cluster.py", lin
e 281, in create_tree_script
    return str(tree_script), pathlib.Path(newick).relative_to(pathlib.Path.cwd())
  File "src/miniconda/envs/trycycler/lib/python3.9/pathlib.py", line 939, in relative_to
    raise ValueError("{!r} is not in the subpath of {!r}"
ValueError: 'contigs.newick' is not in the subpath of 'assemblies_subsets' OR one path is relative and the oth
er is absolute.

The newick file is the only file missing in the output folder. Not sure how to interpret this error! This was the command I used:

trycycler cluster --assemblies ${subset}/*fasta --reads $longread --out_dir ${out_loc}/${subset_id} --threads 16

Thanks in advance!

rrwick commented 3 years ago

Coincidentally, I pushed a fix for that bug to Trycycler's main branch only a couple weeks ago! I just now made a new release of Trycycler which contains the fix. It won't be in conda yet, but if you install/upgrade Trycycler manually in your conda environment to v0.5.1, that should fix the problem.

Ryan

rrwick commented 3 years ago

I just made a pull request for bioconda (https://github.com/bioconda/bioconda-recipes/pull/30970) with the new version of Trycycler, so that will hopefully be merged soon and then you'll be able to install the newest version of Trycycler via conda.

rrwick commented 3 years ago

Looks like the PR merged, so Trycycler v0.5.1 on conda is good to go!

hkaspersen commented 3 years ago

Hello again Ryan! I updated Trycycler to 0.5.1 with conda, but unfortunately I still get the same error above. Any suggestions?

rrwick commented 3 years ago

That's no good! Is it the exact same error? Specifically, I'm interested in this part:

  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/cluster.py", line 281, in create_tree_script
    return str(tree_script), pathlib.Path(newick).relative_to(pathlib.Path.cwd())

In Trycycler v0.5.1, line 281 of the cluster.py file no longer looks like that (it was changed to this). So if you see that in your error message, then something has gone wrong with the upgrade, i.e. I think you're still running Trycycler v0.5.0.

If you see something a bit different in your error message, could you please post it here and I'll check it out?

hkaspersen commented 3 years ago

You are correct, it was slightly different from the above:

saving distance matrix: contigs.phylip
Traceback (most recent call last):
  File "/src/miniconda/envs/trycycler2/bin/trycycler", line 10, in <module>
    sys.exit(main())
  File "/src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/__main__.py", line 41, in main
    cluster(args)
  File "/src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/cluster.py", line 43, in cluster
    build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers)
  File "/src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/cluster.py", line 262, in build_tree
    tree_script, newick = create_tree_script(temp_dir, phylip)
  File "/src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/cluster.py", line 281, in create_tree_script
    return str(tree_script), pathlib.Path(newick).relative_to(pathlib.Path.cwd())
  File "/src/miniconda/envs/trycycler2/lib/python3.7/pathlib.py", line 900, in relative_to
    .format(str(self), str(formatted)))
ValueError: 'contigs.newick' does not start with 'trycycler/assemblies_subsets'

I double-checked with the conda version, and I should be running version 0.5.1.

rrwick commented 3 years ago

That's interesting. The error is indeed subtly different, but I think that's because you now seem to be using Python 3.7 whereas it was Python 3.9 before. That shouldn't matter for Trycycler.

However, this part:

  File "/src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/cluster.py", line 281, in create_tree_script
    return str(tree_script), pathlib.Path(newick).relative_to(pathlib.Path.cwd())

shows me that for some reason, you're not running Trycycler v0.5.1.

So I do think something has gone off the rails with your upgrade. I'm not sure why this has happened, but can you delete the conda environment entirely and make a new one from scratch? After you do, check the trycycler version:

trycycler --version
hkaspersen commented 2 years ago

Hello again Ryan, I am sorry for the late reply! I removed the conda environment, re-created it from scratch, and installed trycycler again as the newest version. However when I run it I still get this error:

saving distance matrix: contigs.phylip
Traceback (most recent call last):
  File "src/miniconda/envs/trycycler2/bin/trycycler", line 10, in <module>
    sys.exit(main())
  File "src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/__main__.py", line 41, in main
    cluster(args)
  File "src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/cluster.py", line 43, in cluster
    build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers)
  File "src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/cluster.py", line 262, in build_tree
    tree_script, newick = create_tree_script(temp_dir, phylip)
  File "src/miniconda/envs/trycycler2/lib/python3.7/site-packages/trycycler/cluster.py", line 281, in create_tree_script
    return str(tree_script), pathlib.Path(newick).relative_to(pathlib.Path.cwd())
  File "src/miniconda/envs/trycycler2/lib/python3.7/pathlib.py", line 900, in relative_to
    .format(str(self), str(formatted)))
ValueError: contigs.newick' does not start with 'assemblies_subsets'

Any thoughts on what it may be? I am still lacking the newick files in the output as before.

hkaspersen commented 2 years ago

I figured it out! There was an issue with shared libraries with the local installation of R on out HPC cluster. I removed the link to this library, and it ran smoothly.

rrwick commented 2 years ago

Excellent - glad to hear you've got it working!