rhysnewell / aviary

A hybrid assembly and MAG recovery pipeline (and more!)
GNU General Public License v3.0
76 stars 11 forks source link

Adding binning-only flag, simplify recipes #192

Closed rhysnewell closed 4 months ago

rhysnewell commented 6 months ago

Todo:

extra binners flags splits the skip-binners flag in two for ease of understanding. Default binners are now metabat, vamb, semibin, rosella whilst users have to explicitly specify that they want to run concoct and maxbin (via --extra-binners)

Reasoning is that maxbin and concoct often slow down to a single thread for days at a time. To be fair, rosella used to do that too but is fixed now :) probably

Version bumped to 0.9.0 due to breaking changes to skip-binners argument, version bumped but not released

rhysnewell commented 6 months ago

Yep, testing on your end. I just slapped together a rework for the singlem script as well. I think the read container class in that script would be useful to include in other scripts as it simplifies a lot of the ifelse statements that those scripts use. If you could check on your end specifically checking if #181 and #182 are fixed?

AroneyS commented 6 months ago

Most of the tests succeed here. Just that the diversity symlink has changed. diversity now symlinks to data/singlem_out/singlem_appraisal.tsv instead of linking diversity/singlem_out to data/singlem_out.

Should we set the symlink diversity to data/singlem_out?

AroneyS commented 6 months ago

I don't have the data to test #181 and #182. @wwood?

wwood commented 6 months ago

I don't have any test data for these, but could run them on the datasets where the issues arose.

Do we have a multi-sample short read dataset we use for testing? I guess not?

rhysnewell commented 6 months ago

Most of the tests succeed here. Just that the diversity symlink has changed. diversity now symlinks to data/singlem_out/singlem_appraisal.tsv instead of linking diversity/singlem_out to data/singlem_out.

Should we set the symlink diversity to data/singlem_out?

Oh that shouldn't happen, that would mean the gtdbtk link is incorrect too. Let me fix

AroneyS commented 6 months ago

This fixes the test paths (since it is now aviary_out/diversity rather than aviary_out/diversity/singlem_out. But aviary_out/diversity/metagenome.combined_otu_table.csv is empty.

rhysnewell commented 4 months ago

@AroneyS @wwood I think this is good to go, I've tested on some datasets and it ran without issue. Also fixes #195

AroneyS commented 4 months ago

I get an error trying to install Rosella. It feel familiar, did this happen previously?

Creating conda environment /mnt/hpccs01/home/aroneys/src/aviary/aviary/modules/binning/envs/rosella.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /mnt/hpccs01/home/aroneys/src/aviary/aviary/modules/binning/envs/rosella.yaml:
Command:
mamba env create --quiet --file "/mnt/hpccs01/home/aroneys/src/aviary/test/data/.conda/1b8c6d6ecf0fbb96b64bba5cfbf3362c_.yaml" --prefix "/mnt/hpccs01/home/aroneys/src/aviary/test/data/.conda/1b8c6d6ecf0fbb96b64bba5cfbf3362c_"
Output:
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
AroneyS commented 4 months ago

I realised that aviary_out/diversity/metagenome.combined_otu_table.csv is empty due to The SINGLEM_METAPACKAGE_PATH environment variable, which points to the default data directory, is not set.. Are we still ok letting SingleM fail silently for errors like that?

rhysnewell commented 4 months ago

Ah, no I'll fix that up

rhysnewell commented 4 months ago

@wwood I'm getting this error with a fresh install of singlem + db. What database versions are compatible and how do ensure the correct version is downloaded?

03/10/2024 11:30:43 PM INFO: SingleM v0.16.0
03/10/2024 11:30:43 PM INFO: Retrieval successful. Location of backpack is: /scratch/dbs/singlem
Traceback (most recent call last):
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/bin/singlem", line 658, in <module>
    singlem.pipe.SearchPipe().run(
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/pipe.py", line 63, in run
    metapackage = self._parse_packages_or_metapackage(**kwargs)
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/pipe.py", line 105, in _parse_packages_or_metapackage
    return Metapackage.acquire_default()
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/metapackage.py", line 71, in acquire_default
    backpack = zenodo_backpack.acquire(env_var_name=DATA_ENVIRONMENT_VARIABLE, version=DATA_DEFAULT_VERSION)
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/zenodo_backpack/__init__.py", line 121, in acquire
    if version != zb.data_version_string():
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/zenodo_backpack/__init__.py", line 71, in data_version_string
    return self.contents[DATA_VERSION]
KeyError: 'data_version'
wwood commented 4 months ago

That error is usually because the payload_directory, not the base directory, of the backpack. I'll write some code so it'll work either way, but maybe worth fixing in Aviary? Or is this an error specific to your install?

rhysnewell commented 4 months ago

This is the aviary install, but I guess I'm not quite sure what you are asking. Should the SINGLEM_METAPACKAGE_PATH variable be pointing to the .zb directory or the extracted contents of this folder? Currently, aviary moves payload_directory to a user specified location, so is using the latter

wwood commented 4 months ago

It should point to the .zb directory, because the version is recorded in the CONTENTS.json file within that, rather than the one in the payload directory.

rhysnewell commented 4 months ago

@AroneyS Okay, it should error properly and install the singlem db appropriately now

AroneyS commented 4 months ago

I still get the env variable error. Are you running the test/test_integration.py tests?

rhysnewell commented 4 months ago

Yep, integration tests are running without issue. SingleM is producing a sensible OTU table. You may need to redownload the singlem DB and reset the aviary configuration?

AroneyS commented 4 months ago

singlem_pipe_reads is working for me but not singlem_appraise I guess its due to using bash -c?

I also get SingleM Errored, please check data/singlem_out/singlem_log.txt but no error.

rhysnewell commented 4 months ago

True, I hadn't caught that singlem_appraise was also busted but silently. I've reworked it, tested on my end and it works as expected

AroneyS commented 4 months ago

Tests work on my end now