griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
141 stars 59 forks source link

Duplicate TSV index #87

Closed yang-yangfeng closed 6 years ago

yang-yangfeng commented 6 years ago

working directory: /gscmnt/gc2547/griffithlab/yafeng/PRAD

command: pvacfuse run --net-chop-method cterm --netmhc-stab --iedb-install-directory /gscmnt/gc2502/griffithlab/yafeng -e 8,9,10,11 TCGA-KK-A6E1-01/fusion_antigen_out/TCGA-KK-A6E1-01.bedpe.annot sample HLA-A*02:03,HLA-A*24:02,HLA-B*42:01,HLA-C*07:01 NNalign NetMHC NetMHCIIpan NetMHCcons NetMHCpan PickPocket SMM SMMPMBEC SMMalign TCGA-KK-A6E1-01/pvacfuse_output

output:


Converting .bedpe to TSV
Completed
Splitting TSV into smaller chunks
Splitting TSV into smaller chunks - Entries 1-22
Completed
Generating Variant Peptide FASTA and Key Files
Generating Variant Peptide FASTA and Key Files - Entries 1-44
Wildtype sequence length is shorter than desired peptide sequence length at position (5 / 1, 138566625 / -1, -1 / 117050334). Using wildtype sequence length (10) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (19 / 7, -1 / 157216866, 803636 / -1). Using wildtype sequence length (7) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (19 / 7, -1 / 157216866, 804208 / -1). Using wildtype sequence length (8) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (19 / 7, -1 / 157216866, 804438 / -1). Using wildtype sequence length (19) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (X / 5, -1 / -1, 48539920 / 138561789). Using wildtype sequence length (19) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (X / 5, -1 / -1, 48541442 / 138561789). Using wildtype sequence length (3) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (21 / 21, 41507949 / -1, -1 / 38445621). Using wildtype sequence length (20) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (21 / 21, -1 / 36146075, 32413013 / -1). Using wildtype sequence length (4) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (21 / 21, -1 / 36146075, 32453508 / -1). Using wildtype sequence length (9) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (3 / 3, -1 / 146923601, 129469417 / -1). Using wildtype sequence length (6) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (3 / 3, -1 / -1, 138274544 / 108191643). Using wildtype sequence length (5) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (1 / 1, 83904078 / 84325636, -1 / -1). Using wildtype sequence length (17) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (3 / 3, -1 / -1, 143038960 / 122537073). Using wildtype sequence length (2) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (16 / 16, 52546636 / -1, -1 / 52058428). Using wildtype sequence length (6) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (21 / 21, -1 / 38810034, 36137932 / -1). Using wildtype sequence length (6) instead.
Wildtype sequence length is shorter than desired peptide sequence length at position (22 / 22, 19447377 / -1, -1 / 19456918). Using wildtype sequence length (15) instead.
Completed
Processing entries for Allele HLA-A*02:03 and Epitope Length 8 - Entries 1-44
Running IEDB on Allele HLA-A*02:03 and Epitope Length 8 with Method NetMHC - Entries 1-44
Completed
Running IEDB on Allele HLA-A*02:03 and Epitope Length 8 with Method NetMHCcons - Entries 1-44
Completed
Running IEDB on Allele HLA-A*02:03 and Epitope Length 8 with Method NetMHCpan - Entries 1-44
Completed
Running IEDB on Allele HLA-A*02:03 and Epitope Length 8 with Method PickPocket - Entries 1-44
Completed
Running IEDB on Allele HLA-A*02:03 and Epitope Length 8 with Method SMM - Entries 1-44
Completed
Running IEDB on Allele HLA-A*02:03 and Epitope Length 8 with Method SMMPMBEC - Entries 1-44
Completed
Parsing IEDB Output for Allele HLA-A*02:03 and Epitope Length 8 - Entries 1-44
Duplicate TSV indexes```
susannasiebert commented 6 years ago

The INTEGRATE-Neo file contains the following two entries:

21__41507949____-1__21__-1__38423561____TMPRSS2>>ERG____3___-___-___1___1___1___WTPEALAAMESRSVGKVSSASREASGNLKSWWKVKGKQAHLTWPEQEQERGRRCHILLNNQISX____18__1___ENST00000398585(3124);;ENST00000288319|ENST00000398897|ENST00000398905|ENST00000398907|ENST00000398910|ENST00000398911|ENST00000398919|ENST00000417133|ENST00000442448|ENST00000453032|ENST00000468474|ENST00000473107|ENST00000481609|ENST00000492833__0
21__41507949____-1__21__-1__38445621____TMPRSS2>>ERG____3___-___-___4___1___1___WTPEALAAMESRSVGKVSSX____18__1___ENST00000398585(3124);;ENST00000288319|ENST00000398905|ENST00000398907|ENST00000398910|ENST00000398911|ENST00000398919|ENST00000417133|ENST00000442448|ENST00000468474|ENST00000473107|ENST00000481609|ENST00000492833__0

They are very similar but differ slightly in the transcripts and the resulting protein sequence. Right now we don't use the transcript list to generate the line's index but maybe we should. We could also move to a numbered index instead of the descriptive one we are constructing right now.

susannasiebert commented 6 years ago

This should've been fixed in 1.0.2. I'm closing this issue but feel free to reopen if this is still a problem.