merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

Incompatibility of master's anvi-dereplicate-genomes brach with latest fastANI #1382

Closed UriaMorP closed 4 years ago

UriaMorP commented 4 years ago
Anvi'o version ...............................: esther (v6.1-master)
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

Operating system: Linux.

Running anvi-dereplicate-genomes with --program fastANI results in fastANI ConfigError.

Log:

# DATE: 19 Mar 20 16:06:20
# CMD LINE: fastANI --ql /home/labs/elinav/uria/tmp/tmptxn3z0h0/fasta_paths.txt --rl /home/labs/elinav/uria/tmp/tmptxn3z0h0/fasta_paths.txt -k 16 --fragLen 3000 --minFrag 50 -t 20 -o /home/labs/elinav/uria/tmp/tmptxn3z0h0/output
Unknown option: 'minFrag'

Error occurred for fastANI version 1.3. For version 1.2 all is good.

meren commented 4 years ago

Anvi'o requires / suggests 0.2.10 (see the most current requirements.txt) as it is the only PyANI version on conda :)

One can run,

pip install -r requirements.txt

In their anvi'o virtual environment to make sure versions of dependencies match :)

Thanks,

UriaMorP commented 4 years ago

To make it clear, I'm talking about fastANI for which there is no offered nor suggested version.

meren commented 4 years ago

Pfft fastANI not pyANI :( I'm very sorry and thank you for pointing it out.

Have you been installing fastANI through conda?

UriaMorP commented 4 years ago

Alles gut :)

Have you been installing fastANI through conda?

No, I downloaded binary from their releases: https://github.com/ParBLiSS/FastANI/releases

meren commented 4 years ago

Thanks, I see it is also in bioconda.

@ekiefl, either we can enforce 1.2 in anvio conda recipe, or we can fix the minFrag option in the codebase leave the recipe as is. what say you?

UriaMorP commented 4 years ago

Two cents: I found it very hard to query fastANI for it's version, so doing version dependent branching will be a pain

meren commented 4 years ago

When we set the version in the conda recipe, in theory it does it for the user so no one needs to do any manual work :)

UriaMorP commented 4 years ago

By branching, I meant writing version specific wrappers. Which might be a pain if a piece of software is reluctant to tell you it's version :)... Until someone sharpens his/hers c code writing skills, enforcing V1.2 might be a better idea...

meren commented 4 years ago

Yes, that's what I mean. The conda package sticks with one version, with which the anvi'o codebase is in full agreement. The user doesn't realize if they installed 1.2 or 1.3. So in the case of conda installations and Docker solutions, we have control. If they install a version we don't support, then we can help them solve their issues by pointing them to the right direction.

I know. There is no end to writing wrappers otherwise, and it is not quite sustainable.

ekiefl commented 4 years ago

Re: meren

@ekiefl, either we can enforce 1.2 in anvio conda recipe, or we can fix the minFrag option in the codebase leave the recipe as is. what say you?

I think for now, I would prefer to enforce 1.2 in the conda recipe because it works. The proper solution is to update transition to 1.3, change --minFrag to --minFraction everywhere it occurs, and verify 1.3 is working how we intend. Check the release log on 1.3 from here: https://github.com/ParBLiSS/FastANI/releases

image

There is no end to writing wrappers otherwise, and it is not quite sustainable

I completely agree.


Re: UriaMorP

Thank you for the report.

Until someone sharpens his/hers c code writing skills, enforcing V1.2 might be a better idea...

@cjain7's C skills are clearly very good to have written such a nice piece of software.

meren commented 4 years ago

Until someone sharpens his/hers c code writing skills, enforcing V1.2 might be a better idea...

@cjain7's C skills are clearly very good to have written such a nice piece of software.

I fully agree. Although until v1.3 the following was the case, so the frustration is understandable.

$ fastANI -v
Unknown option: 'v'
$ fastANI --version
Unknown option: 'version'
$ fastANI
Required option missing: '-o, --output'
$ fastANI -h
-----------------
fastANI is a fast alignment-free implementation for computing whole-genome
Average Nucleotide Identity (ANI) between genomes
-----------------
Example usage:
$ fastANI -q genome1.fa -r genome2.fa -o output.txt
$ fastANI -q genome1.fa --rl genome_list.txt -o output.txt

Available options
-----------------
-h, --help
    Print this help page

-r <value>, --ref <value>

(...) No version information (...)

It reminds me of ed a little :p

meren commented 4 years ago

I decided that we should update it to v1.3. Open-source software developers go through hell to make new releases, so we should ensure we follow their advice as often as we can.

I installed and tested v1.3. Some changes were necessary, and they are coming as a commit in a second, but basically, I used the infant gut tutorial dataset,

cd additional-files/pangenomics

# with v1.2
anvi-dereplicate-genomes -e external-genomes.txt --similarity-threshold 0.90 -o FASTANI_1.2 -T 5 --program fastANI

# after installing v1.3
anvi-dereplicate-genomes -e external-genomes.txt --similarity-threshold 0.90 -o FASTANI_1.3 -T 5 --program fastANI

tar -zcf FASTANI_1.2.tar.gz FASTANI_1.2
tar -zcf FASTANI_1.3.tar.gz FASTANI_1.3

The output files are attached (after I removed the GENOMES directory). To me they looked virtually identical, but please feel free to compare:

FASTANI_1.2.tar.gz FASTANI_1.3.tar.gz

UriaMorP commented 4 years ago

Thanks a lot @meren that's brilliant I already compared the raw output of equivalent calls to fastANI using our data, and I also saw that the results are identical (called diff on each pair of files of the two outputs).