psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
57 stars 34 forks source link

AIRR output crashes trying to call get_droplet_id #317

Closed ressy closed 2 years ago

ressy commented 2 years ago

I'm able to use the parse-output.py script to convert a partition.yaml to FASTA, but when I try with --airr-output it crashes. For example, with one of the test files, this works:

$ parse-output.py test/paired/ref-results/partition-new-simu/partition-igh.yaml test.fa
  found 13 clusters in best partition
    taking all 13 clusters
  writing 63 sequences to test.fa

But this doesn't:

$ parse-output.py --airr-output test/paired/ref-results/partition-new-simu/partition-igh.yaml test.tsv
  found 13 clusters in best partition
    taking all 13 clusters
  writing 13 sequences to test.tsv
   writing airr annotations to test.tsv
Traceback (most recent call last):
  File "/home/jesse/opt/partis/bin/parse-output.py", line 179, in <module>
    utils.write_airr_output(args.outfile, annotation_list, cpath=cpath, extra_columns=args.extra_columns, skip_columns=args.skip_columns)
  File "/data/home/jesse/opt/partis/python/utils.py", line 1649, in write_airr_output
    aline = get_airr_line(line, iseq, partition=None if cpath is None else cpath.partitions[cpath.i_best], extra_columns=extra_columns, skip_columns=skip_columns, debug=debug)
  File "/data/home/jesse/opt/partis/python/utils.py", line 1566, in get_airr_line
    aline[akey] = get_droplet_id(pline['unique_ids'][iseq])
TypeError: get_droplet_id() takes at least 3 arguments (1 given)

(It looks to me like the traceback is right, get_droplet_id does expect three arguments instead of the one it gets called with.)

psathyrella commented 2 years ago

whoops, I should've merged to master sooner. That got fixed here: https://github.com/psathyrella/partis/commit/4a847e269175dba8c1a74a3e140b5f7996da3f40

i just pushed to master, and i'm rebuilding to push to docker, which should be done in a half hour or so.

ressy commented 2 years ago

Oh oops, I should have checked for that first myself but I forgot I was working with the master branch within my install and not dev. Thanks!

ressy commented 2 years ago

Great, AIRR output works for me now in the master branch.

On a side note ./test/test.py now crashes on the annotate step, though. I'm wondering if the Dockerfile's test command just misses this since it uses --quick only? Assuming it's not just me and my conda setup.

$ ./test/test.py --run-all --print-width 0
run ./test/test.py --print-width 0
cache-parameters-simu            /data/home/jesse/dev/examples/example-partis/partis/bin/partis cache-parameters --dont-write-git-info --infname test/ref-results/test/simu.yaml --parameter-dir test/new-results/test/parameters/simu --sw-cachefname test/new-results/test/parameters/simu/sw-cache.yaml --is-simu --random-seed 1 --n-procs 10
annotate-new-simu                /data/home/jesse/dev/examples/example-partis/partis/bin/partis annotate --dont-write-git-info --infname test/ref-results/test/simu.yaml --parameter-dir test/new-results/test/parameters/simu --sw-cachefname test/new-results/test/parameters/simu/sw-cache.yaml --plot-annotation-performance --is-simu --plotdir test/new-results/annotate-new-simu-annotation-performance --only-csv-plots --random-seed 1 --n-procs 10 --outfname test/new-results/annotate-new-simu.yaml
  log tail:
          plotting performance 
            warning skipped annotation performance evaluation on 2 queries with different true and inferred net shm indel lengths: 7326781271496498789 4651803758159415231
        0 0.000000 2.000000
        1 0.000000 0.000000
        2 0.000000 0.000000
        3 0.000000 0.000000
        4 0.000000 0.000000
        5 0.000000 0.000000
        6 0.000000 0.000000
        7 0.000000 2.000000

Traceback (most recent call last):
  File "./test/test.py", line 881, in <module>
    utils.simplerun(cmd_str, dryrun=args.dry_run)
  File "/data/home/jesse/dev/examples/example-partis/partis/python/utils.py", line 4324, in simplerun
    subprocess.check_call(cmd_str if shell else cmd_str.split(), env=os.environ, shell=shell)
  File "/home/jesse/miniconda3/envs/example-partis/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['./test/test.py', '--print-width', '0']' returned non-zero exit status 1

The stderr from that command complains about something in the plotting code:

$ /data/home/jesse/dev/examples/example-partis/partis/bin/partis annotate --dont-write-git-info --infname test/ref-results/test/simu.yaml --parameter-dir test/new-results/test/parameters/simu --sw-cachefname test/new-results/test/parameters/simu/sw-cache.yaml --plot-annotation-performance --is-simu --plotdir test/new-results/annotate-new-simu-annotation-performance --only-csv-plots --random-seed 1 --n-procs 10 --outfname test/new-results/annotate-new-simu.yaml
annotating    (with test/new-results/test/parameters/simu/hmm)
smith-waterman
  vsearch: 40 / 40 v annotations (0 failed) with 7 v genes in 0.1 sec
        reading sw results from test/new-results/test/parameters/simu/sw-cache.yaml
      info for 40 / 40 = 1.000   (removed: 0 failed, 0 duplicates)
      kept 19 (0.475) unproductive
  plotting performance 
    warning skipped annotation performance evaluation on 2 queries with different true and inferred net shm indel lengths: 7326781271496498789 4651803758159415231
0 0.000000 2.000000
1 0.000000 0.000000
2 0.000000 0.000000
3 0.000000 0.000000
4 0.000000 0.000000
5 0.000000 0.000000
6 0.000000 0.000000
7 0.000000 2.000000
Traceback (most recent call last):
  File "/data/home/jesse/dev/examples/example-partis/partis/bin/partis", line 1471, in <module>
    args.func(args)
  File "/data/home/jesse/dev/examples/example-partis/partis/bin/partis", line 315, in run_partitiondriver
    parter.run(actions)
  File "/data/home/jesse/dev/examples/example-partis/partis/python/partitiondriver.py", line 127, in run
    self.action_fcns[tmpaction]()
  File "/data/home/jesse/dev/examples/example-partis/partis/python/partitiondriver.py", line 307, in annotate
    self.run_waterer(look_for_cachefile=not self.args.write_sw_cachefile, write_cachefile=self.args.write_sw_cachefile, count_parameters=self.args.count_parameters)
  File "/data/home/jesse/dev/examples/example-partis/partis/python/partitiondriver.py", line 217, in run_waterer
    waterer.read_cachefile(cachefname)
  File "/data/home/jesse/dev/examples/example-partis/partis/python/waterer.py", line 172, in read_cachefile
    self.finalize(cachefname=None, just_read_cachefile=True, ignore_seed_unique_id=ignore_seed_unique_id, quiet=quiet)
  File "/data/home/jesse/dev/examples/example-partis/partis/python/waterer.py", line 269, in finalize
    perfplotter.plot(self.args.plotdir + '/sw', only_csv=self.args.only_csv_plots)
  File "/data/home/jesse/dev/examples/example-partis/partis/python/performanceplotter.py", line 249, in plot
    hist = hutils.make_hist_from_dict_of_counts(self.values[column], 'int', self.name + '-' + column)
  File "/data/home/jesse/dev/examples/example-partis/partis/python/hutils.py", line 136, in make_hist_from_dict_of_counts
    raise Exception('overflows in ' + hist_label)
Exception: overflows in sw-d_5p_del

I swear I'm almost done breaking things and I'll move on to regular usage very soon now :)

psathyrella commented 2 years ago

breaking things is the best, how else will i ever find all the bugs?

fixed here https://github.com/psathyrella/partis/commit/79b09aabee073c47b1d71071023f6dbd0465c6cf

psathyrella commented 2 years ago

and yeah, the full test run (which is actually test.py --run-all) seems too slow to add to the docker build, although this would've been caught sooner otherwise it's true. I think this is about the first time i've forgotten to run it before merging to master tho.