Open sidneymbell opened 2 years ago
Hi @sidneymbell,
I recently saw a similar error in a completely different pipeline that came from pd.DataFrame.to_dict
.
Within augur export v2
, this error would be coming from reading the metadata.
read_metadata
uses either strain
or name
as the index column depending on which one is available within your metadata file. Could you verify that there are no duplicate values in those columns?
That'll do it 🤦♀️ . Thanks, @joverlee521 ! (I love when it's user error -- always the easiest to fix!)
Hmm, I'm going to re-open this since it's an unhandled error and a better error message would be nice.
Re: installation notes
I'm currently operating in a virtualenv with everything pip or brew installed.
This may work now, but it'll be hard to keep versions of various packages up to date. A couple questions:
brew
?pip
or brew
.even on a completely clean, fresh conda installation, mamba install fails due to an inability to solve the environment (even overnight)
This doesn't sound right! I think I might've seen this a while ago but forgot how it got resolved. Could you paste the outputs of conda --version
and mamba --version
?
@victorlin -- Yeah for sure. Conda version is 4.12.0; it's brand new after a full anaconda-clean
and reinstallation of anaconda from dmg
.
No mamba installation -- failed after many attempts, including letting it try to solve the environment and inspect dependency conflicts overnight. (No clue why it would even have any on a completely fresh conda installation).
Ended up doing
brew tap brewsci/bio
brew install mafft iqtree raxml fasttree
(per docs, which are beautiful btw)
I've got auspice installed from source from awhile ago, although I actually just use auspice.us 99% of the time.
I see. So Anaconda comes with a lot of bloat, which is why we use Miniconda in the install docs. I don't use Anaconda personally, but maybe it is trying to search in too many channels. You could try --override-channels
since conda-forge
alone is sufficient for Mamba installation:
conda install -n base -c conda-forge --override-channels mamba --yes
Also, 4.12.0 is not the latest version of Conda – look at the sidebar of the release notes page:
Just hitting the same error, would be great if it contained better debugging tips for the end user:
rule export:
input: results/tree.nwk, results/vmr.tsv
output: auspice/taxonomy.json
jobid: 0
reason: Missing output files: auspice/taxonomy.json
resources: tmpdir=/var/folders/qf/4kkcfypx0gbfb0t9336522_r0000gn/T
augur export v2 --tree results/tree.nwk --output auspice/taxonomy.json --color-by-metadata "Genome composition" "Host source" --metadata results/vmr.tsv --metadata-id-column "Virus name(s)"
ERROR: DataFrame index must be unique for orient='index'.
[Mon Oct 9 06:29:56 2023]
Error in rule export:
jobid: 0
input: results/tree.nwk, results/vmr.tsv
output: auspice/taxonomy.json
shell:
augur export v2 --tree results/tree.nwk --output auspice/taxonomy.json --color-by-metadata "Genome composition" "Host source" --metadata results/vmr.tsv --metadata-id-column "Virus name(s)"
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-10-09T062955.159249.snakemake.log
Current Behavior
Initial googling suggests that this might be related to
pd.DataFrame.to_json
, but I actually don't see any instances of this in the augur codebase. Just to be safe, I validated that my metadata file does have a unique index (perdf.index.unique == True
).All other files are outputs of other augur subcommands, roughly following the zika tutorial, adapted for HCV. One note on the data is that for whatever reason many HCV samples in ncbi virus don't have collection dates, so I opted to skip date metadata filtering, timetree inference and clock rate iqr filtering.
Ultimate goal is really just to get a tree json with hcv genotype labels to use for lineage calling in nextclade.
Happy to keep debugging, but I'm hoping that someone's encountered this before?
Your environment: if running Nextstrain locally
mac augur v 18.0.0 installation / env notes: I ended up needing to nuke my nextstrain conda env, only to later discover that even on a completely clean, fresh conda installation, mamba install fails due to an inability to solve the environment (even overnight). So I'm currently operating in a virtualenv with everything pip or brew installed.
Thanks, y'all!