psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
54 stars 34 forks source link

partis ignoring --extra-annotation-columns #276

Closed scharch closed 5 years ago

scharch commented 5 years ago

I must be doing something really stupid here, because I know this was working a couple of months ago and I can't think of anything I've changed since then. Anything you can think of?

$ partis annotate --extra-annotation-columns invalid --extra-annotation-columns in_frames --extra-annotation-columns stops --extra-annotation-columns cdr3_seqs --extra-annotation-columns v_gl_seq --extra-annotation-columns v_qr_seqs --n-procs 6 --infname foo.fa --outfname foo.csv --parameter-dir partis
  note: --outfname uses deprecated file format .csv. This will still work fine, but the new default .yaml format is much cleaner, and includes annotations, partitions, and germline info in the same file.
annotating
smith-waterman
  vsearch: 499 / 500 v annotations (1 failed) with 3 v genes in 0.3 sec
    running 6 procs for 500 seqs
    running 8 procs for 156 seqs
      info for 500 / 500 = 1.000   (0 failed)
      kept 240 (0.480) unproductive
    water time: 2.7  (ig-sw 0.2  processing 0.7)
hmm
    prepare_for_hmm: (0.1 sec)
    running 6 procs
                    calcd:         vtb 500        fwd   0
             min-max time:  2.4 - 2.8 sec
    read output
        processed 500 hmm output lines with 500 sequences in 500 events  (0 failures)
         infra time: 0.9
      hmm step time: 3.7
      total time: 6.9

$ head -1 foo.csv
unique_ids,invalid,v_gene,d_gene,j_gene,cdr3_length,mut_freqs,n_mutations,input_seqs,indel_reversed_seqs,has_shm_indels,qr_gap_seqs,gl_gap_seqs,naive_seq,duplicates,v_per_gene_support,d_per_gene_support,j_per_gene_support,v_3p_del,d_5p_del,d_3p_del,j_5p_del,v_5p_del,j_3p_del,vd_insertion,dj_insertion,fv_insertion,jf_insertion,mutated_invariants,in_frames,stops,codon_positions,v_qr_seqs

Note that cdr3_seqs and v_gl_seq are not columns in the output. They are not being output if I use yaml, either.

scharch commented 5 years ago

Still using commit 00421b4158455e924d0a610ca0015c22465f4681, btw

psathyrella commented 5 years ago

oh, it's because the --extra-annotation-columns is supposed to be specified as a colon-separated list, like

--extra-annotation-columns invalid:in_frames:stops:cdr3_seqs:v_gl_seq:v_qr_seqs

I think like that ^ it only takes the last one that's specified. I think there is an argparse option that does behave the way you're expecting, but I'm not sure what it is.

At least this seems to fix it when I run that ^ command here.

scharch commented 5 years ago

I could swear it worked the other way before, but maybe I'm just loosing my mind. Will test next week and let you know.