psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
55 stars 34 forks source link

Warnings after adding custom germline database #214

Closed krdav closed 7 years ago

krdav commented 7 years ago

As the title suggests. Everything looks normal until after the sw step and then the complaints start:

.
.
.
smith-waterman 
  ESC[91mwarningESC[0m 126 genes in glfo that don't have yamels in _output/_home_projects_cu_10049_data_KD_sandbox_saras_data_partis_VH_nl_partis_inp_part/hmm
        reading sw results from _output/_home_projects_cu_10049_data_KD_sandbox_saras_data_partis_VH_nl_partis_inp_part/sw-cache-7819920073823357470.csv
hmm
    writing input
  ESC[91mwarningESC[0m no hmm files for glfo genes IGHV2-5*02 IGHV2-5*03 IGHV2-5*01 IGHV3-30-3*02 IGHV2-5*04 IGHV2-5*05 IGHV2-5*08 IGHV2-5*09 IGHD4-11*01 IGHV4-34*07 IGHV4-34*06 IGHV4-34*05 IGHV4-34*04 IGHV4-34*03 IGHV4-34*01 IGHV4-34*09 IGHV4-34*08 IGHV3-30*17 IGHV3-30*16 IGHV3-30*15 IGHV3-30*14 IGHV3-30*13 IGHV3-30*12 IGHV3-30*11 IGHV3-30*10 IGHV3-30*19 IGHV1-2*04 IGHV1-2*05 IGHV1-2*02 IGHV1-2*03 IGHV1-2*01 IGHV3-33*05 IGHV3-33*04 IGHV3-33*01 IGHV3-33*03 IGHV3-33*02 IGHV4-28*02 IGHV4-28*03 IGHJ2*01 IGHV4-28*06 IGHV4-28*07 IGHV3-35*01 IGHV4-31*10 IGHV6-1*02 IGHV4-34*10 IGHV4-34*11 IGHV4-34*12 IGHV4-34*13 IGHV2-5*06 IGHV3-30*04 IGHV3-30*05 IGHV3-30*06 IGHV3-30*07 IGHV3-30*01 IGHV3-30*02 IGHV3-30-3*01 IGHV3-30*08 IGHV3-30*09 IGHV3-38-3*01 IGHV4-28*04 IGHV4-28*05 IGHV4-28*01 IGHV4-31*08 IGHV4-31*09 IGHV4-31*04 IGHV4-31*05 IGHV4-31*06 IGHV4-31*07 IGHV4-31*01 IGHV4-31*02 IGHJ5*02 IGHJ5*01 IGHV3-7*03 IGHV3-7*02 IGHV3-7*01 IGHV3-7*05 IGHD1-1*01 IGHD4-17*01 IGHD2-8*02 IGHD2-8*01 IGHJ1*01 IGHV3-11*05 IGHV3-11*04 IGHV3-11*06 IGHV3-11*03 IGHV3-38*01 IGHV3-38*02 IGHV3-38*03 IGHD2-21*01 IGHD2-21*02 IGHJ3*01 IGHJ3*02 IGHV3-23D*01 IGHV3-23D*02 IGHV2-26*01 IGHV1-3*01 IGHV1-3*02 IGHV1-8*01 IGHV1-8*02 IGHD2-2*01 IGHD2-2*02 IGHD5-24*01 IGHJ6*04 IGHJ6*01 IGHD3-3*01 IGHV4-39*04 IGHV4-39*05 IGHV4-39*03 IGHV3-9*01 IGHV3-9*03 IGHV3-9*02 IGHD6-25*01 IGHV3-23*05 IGHV3-23*02 IGHV3-23*03 IGHJ4*01 IGHD1-7*01 IGHV4-4*07 IGHV4-4*06 IGHV4-4*05 IGHV4-4*04 IGHV4-4*03 IGHV4-4*01 IGHD1-14*01 IGHV4-4*08 IGHD3-16*01
    running 6 procs
.
.
.

FYI: The custom germline database is just a subset of the standard human VH database that comes with partis.

psathyrella commented 7 years ago

yeah, sorry, I should reword that warning. It doesn't indicate anything is wrong, per se, it's just telling you that something's happened which should not happen under typical circumstances. Under the defaults, if everything's well, there should be a 1 to 1 correspondence between the internal gl info genes, and the genes for which there's hmm yamels. Using --initial-germline-dir at various steps can definitely cause this to not be the case.

But I assume you inferred parameters with the custom germline database?

psathyrella commented 7 years ago

p.s. in emacs M-x display-ansi-colors fixes the ansi codes. dunno how other editors handle it

psathyrella commented 7 years ago

actually, can you just post the commands you ran?

krdav commented 7 years ago

I just made a subset database and deleted all the cached parameters from when I was running it last time. Here is the command used:

python2.7 ./bin/partis partition --print-cluster-annotations --initial-germline-dir /home/projects/cu_10049/apps/partis_old/data/germlines/Omni_Rat_germline_set --n-procs 10 --workdir /tmp/partis --infname /home/projects/cu_10049/data/KD_sandbox/saras_data/partis_VH/nl/partis_inp_part.fasta --outfname /home/projects/cu_10049/data/KD_sandbox/saras_data/partis_VH/nl/lnico_partition2.csv

psathyrella commented 7 years ago

ah, ok, great. If you're just running the one command then the problem's on my end. I'll see if I can reset args.initial_germline_dir internally after the parameter caching step, but I'll at least reword the warning.

I'm guessing you'd prefer to have an output annotation file with the cluster annotations there? I had that implemented at some point, I must have taken it out, I think I decided it was too many options. And internally, that file actually already exists (it's how the hmm passes the annotations to the python), so it'd be easy to save it somewhere else.

krdav commented 7 years ago

You mean the annotation for each sequence individually? Like the file you get when doing run-vertibi? Sure that would be useful but most of the information I care about is on clone level and that is printed nicely when using --print-cluster-annotations.

psathyrella commented 7 years ago

Ah, no, I mean the annotation for each clone.

And I knew I'd implemented this, I just forgot to document it. If you specify --oufname and --print-cluster-annotations at the same time, the cluster annotations are automatically written to --outfname with -cluster-annotations tacked on the end. It's documented now.

krdav commented 7 years ago

Oh, thats what you meant. I already discovered that, and you are exactly right, this is the output I need the most from partitioning.

psathyrella commented 7 years ago

re ansi colors, I just discovered the -r option to less, which is much more useful than my suggestion above

krdav commented 7 years ago

Uhh, this is so cool. Thank you!