Closed rhysnewell closed 4 months ago
Yep, testing on your end. I just slapped together a rework for the singlem script as well. I think the read container class in that script would be useful to include in other scripts as it simplifies a lot of the ifelse statements that those scripts use. If you could check on your end specifically checking if #181 and #182 are fixed?
Most of the tests succeed here. Just that the diversity symlink has changed. diversity
now symlinks to data/singlem_out/singlem_appraisal.tsv
instead of linking diversity/singlem_out
to data/singlem_out
.
Should we set the symlink diversity
to data/singlem_out
?
I don't have the data to test #181 and #182. @wwood?
I don't have any test data for these, but could run them on the datasets where the issues arose.
Do we have a multi-sample short read dataset we use for testing? I guess not?
Most of the tests succeed here. Just that the diversity symlink has changed.
diversity
now symlinks todata/singlem_out/singlem_appraisal.tsv
instead of linkingdiversity/singlem_out
todata/singlem_out
.Should we set the symlink
diversity
todata/singlem_out
?
Oh that shouldn't happen, that would mean the gtdbtk link is incorrect too. Let me fix
This fixes the test paths (since it is now aviary_out/diversity
rather than aviary_out/diversity/singlem_out
.
But aviary_out/diversity/metagenome.combined_otu_table.csv
is empty.
@AroneyS @wwood I think this is good to go, I've tested on some datasets and it ran without issue. Also fixes #195
I get an error trying to install Rosella. It feel familiar, did this happen previously?
Creating conda environment /mnt/hpccs01/home/aroneys/src/aviary/aviary/modules/binning/envs/rosella.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /mnt/hpccs01/home/aroneys/src/aviary/aviary/modules/binning/envs/rosella.yaml:
Command:
mamba env create --quiet --file "/mnt/hpccs01/home/aroneys/src/aviary/test/data/.conda/1b8c6d6ecf0fbb96b64bba5cfbf3362c_.yaml" --prefix "/mnt/hpccs01/home/aroneys/src/aviary/test/data/.conda/1b8c6d6ecf0fbb96b64bba5cfbf3362c_"
Output:
warning libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
I realised that aviary_out/diversity/metagenome.combined_otu_table.csv
is empty due to The SINGLEM_METAPACKAGE_PATH environment variable, which points to the default data directory, is not set.
. Are we still ok letting SingleM fail silently for errors like that?
Ah, no I'll fix that up
@wwood I'm getting this error with a fresh install of singlem + db. What database versions are compatible and how do ensure the correct version is downloaded?
03/10/2024 11:30:43 PM INFO: SingleM v0.16.0
03/10/2024 11:30:43 PM INFO: Retrieval successful. Location of backpack is: /scratch/dbs/singlem
Traceback (most recent call last):
File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/bin/singlem", line 658, in <module>
singlem.pipe.SearchPipe().run(
File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/pipe.py", line 63, in run
metapackage = self._parse_packages_or_metapackage(**kwargs)
File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/pipe.py", line 105, in _parse_packages_or_metapackage
return Metapackage.acquire_default()
File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/metapackage.py", line 71, in acquire_default
backpack = zenodo_backpack.acquire(env_var_name=DATA_ENVIRONMENT_VARIABLE, version=DATA_DEFAULT_VERSION)
File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/zenodo_backpack/__init__.py", line 121, in acquire
if version != zb.data_version_string():
File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/zenodo_backpack/__init__.py", line 71, in data_version_string
return self.contents[DATA_VERSION]
KeyError: 'data_version'
That error is usually because the payload_directory, not the base directory, of the backpack. I'll write some code so it'll work either way, but maybe worth fixing in Aviary? Or is this an error specific to your install?
This is the aviary install, but I guess I'm not quite sure what you are asking. Should the SINGLEM_METAPACKAGE_PATH
variable be pointing to the .zb
directory or the extracted contents of this folder? Currently, aviary moves payload_directory
to a user specified location, so is using the latter
It should point to the .zb directory, because the version is recorded in the CONTENTS.json file within that, rather than the one in the payload directory.
@AroneyS Okay, it should error properly and install the singlem db appropriately now
I still get the env variable error. Are you running the test/test_integration.py
tests?
Yep, integration tests are running without issue. SingleM is producing a sensible OTU table. You may need to redownload the singlem DB and reset the aviary configuration?
singlem_pipe_reads is working for me but not singlem_appraise
I guess its due to using bash -c
?
I also get SingleM Errored, please check data/singlem_out/singlem_log.txt
but no error.
True, I hadn't caught that singlem_appraise
was also busted but silently. I've reworked it, tested on my end and it works as expected
Tests work on my end now
Todo:
extra binners flags splits the skip-binners flag in two for ease of understanding. Default binners are now metabat, vamb, semibin, rosella whilst users have to explicitly specify that they want to run concoct and maxbin (via
--extra-binners
)Reasoning is that maxbin and concoct often slow down to a single thread for days at a time. To be fair, rosella used to do that too but is fixed now :) probably
Version bumped to 0.9.0 due to breaking changes to skip-binners argument, version bumped but not released