theosanderson / taxonium

A tool for exploring very large trees in the browser
http://taxonium.org
GNU General Public License v3.0
97 stars 17 forks source link

usher_to_taxonium: ValueError: DataFrame index must be unique for orient='index'. #520

Closed AngieHinrichs closed 10 months ago

AngieHinrichs commented 10 months ago

Hi Theo! For some reason usher_to_taxonium doesn't like my metadata file today. It fails pretty quickly, with this output:

Loading metadata file..
Traceback (most recent call last):
  File "/cluster/home/angie/miniconda3/bin/usher_to_taxonium", line 8, in <module>
    sys.exit(usher_to_taxonium.main())
  File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/usher_to_taxonium.py", line 307, in main
    do_processing(args.input,
  File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/usher_to_taxonium.py", line 42, in do_processing
    metadata_dict, metadata_cols = utils.read_metadata(metadata_file, columns,
  File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/utils.py", line 31, in read_metadata
    metadata_dict = metadata.to_dict("index")
  File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 2061, in to_dict
    raise ValueError("DataFrame index must be unique for orient='index'.")
ValueError: DataFrame index must be unique for orient='index'.

The metadata file includes GISAID data so I can't share it here. Any ideas? I did a quick check that all lines of the file have the same number of tab-separated columns.

theosanderson commented 10 months ago

Hi Angie, best guess for a starting point would be a duplicated node_id in the metadata - definitely not that? (I haven't yet dived into the line number yet, doing that now)

theosanderson commented 10 months ago

Looking in more detail, it does look like that - that the "key column" has a duplicate value. Obviously it would be great to give a more useful error message here.

AngieHinrichs commented 10 months ago

Ah, will check rn.

AngieHinrichs commented 10 months ago

That was it! My bad, thanks Theo!

theosanderson commented 10 months ago

Np, thanks for raising