Closed cswarth closed 7 years ago
process_partis also crashes on these files, although for various reasons.
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.001-Vh/Hs-LN1-5RACE-IgG-new-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.001-Vh/Hs-LN1-5RACE-IgG-new.csv --cluster_base cluster --output_dir postpartis_out/QB850.001-Vh/Hs-LN1-5RACE-IgG-new --separate --chain "h"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.016-VL/Hs-LN3-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.016-VL/Hs-LN3-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QA255.016-VL/Hs-LN3-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.016-VL/Hs-LN2-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.016-VL/Hs-LN2-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QA255.016-VL/Hs-LN2-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.026-VL/Hs-LN1-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.026-VL/Hs-LN1-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QB850.026-VL/Hs-LN1-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.026-VL/Hs-LN4-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.026-VL/Hs-LN4-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QB850.026-VL/Hs-LN4-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.417-VL/Hs-LN1-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.417-VL/Hs-LN1-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QB850.417-VL/Hs-LN1-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.417-VL/Hs-LN4-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.417-VL/Hs-LN4-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QB850.417-VL/Hs-LN4-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.001-VL/Hs-LN1-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.001-VL/Hs-LN1-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QB850.001-VL/Hs-LN1-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.001-VL/Hs-LN4-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.001-VL/Hs-LN4-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QB850.001-VL/Hs-LN4-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.067-VL/Hs-LN3-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.067-VL/Hs-LN3-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QA255.067-VL/Hs-LN3-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.067-VL/Hs-LN2-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.067-VL/Hs-LN2-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QA255.067-VL/Hs-LN2-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.006-VL/Hs-LN3-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.006-VL/Hs-LN3-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QA255.006-VL/Hs-LN3-5RACE-IgL-100k --separate --chain "L"
process_partis.py --annotations /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.006-VL/Hs-LN2-5RACE-IgL-100k-cluster-annotations.csv --partition /fh/fast/matsen_e/dshaw/cft/data/seeds/QA255.006-VL/Hs-LN2-5RACE-IgL-100k.csv --cluster_base cluster --output_dir postpartis_out/QA255.006-VL/Hs-LN2-5RACE-IgL-100k --separate --chain "L"
For the first problem file, the partis
utils only searches through known V genes in the IMGT database and IGHV1-18*1m
is not one of these. We had this issue before with a V gene of the form IGHV1-18+C1M
or some such, which the line
line['v_gene'] = line['v_gene'].split('+')[0]
is used to guard against, but perhaps we need a better way to work around the cases where partis
will infer a gene that does not have an exact match in the IMGT database. For example, I assume IGHV1-18*1m
should be changed to IGHV1-18*01
, but is there some way to guess exactly all the cases that partis
will modify a gene's IMGT tag?
Posting for posterity, but I'll look into it.
@dawahs this could be partis-inferred genes, which should be stored as part of the output.
cc @psathyrella
I can't see the whole context, but in general the problem is probably: if you use --parameter-dir to cache parameters, you must use it during all subsequent steps, since it gets the germline set from the parameter dir. In this particular case, the 1m
is a caprisa germline, if that helps.
Yes, that is helpful. Thanks.
Attempting to process partis output file in
/fh/fast/matsen_e/dshaw/cft/data/seeds/QB850.001-Vh/
Some additional debugging statements added.