Closed manulera closed 1 year ago
I've now applied these changes to Canto. I'll apply the fixes to the modifications in SVN next.
08562810fce65760: changing SPCC622.08c PMID:20661445 MOD:00696 S128 to S129
08562810fce65760: changing SPAC19G12.06c PMID:20661445 MOD:00696 S127 to S128
08562810fce65760: changing SPCC622.08c PMID:20661445 MOD:00696 S128 to S129
08562810fce65760: changing SPCC622.08c PMID:20661445 MOD:00696 S128 to S129
08562810fce65760: changing SPAC19G12.06c PMID:20661445 MOD:00696 S127 to S128
08562810fce65760: changing SPAC19G12.06c PMID:20661445 MOD:00696 S127 to S128
08562810fce65760: changing SPCC622.08c PMID:20661445 MOD:00696 S128 to S129
08562810fce65760: changing SPAC19G12.06c PMID:20661445 MOD:00696 S127 to S128
08562810fce65760: changing SPCC622.08c PMID:20661445 MOD:00696 S128 to S129
08562810fce65760: changing SPAC19G12.06c PMID:20661445 MOD:00696 S127 to S128
08562810fce65760: changing SPCC622.08c PMID:20661445 MOD:00696 S128 to S129
08562810fce65760: changing SPCC622.08c PMID:20661445 MOD:00696 S128 to S129
08562810fce65760: changing SPAC19G12.06c PMID:20661445 MOD:00696 S127 to S128
08562810fce65760: changing SPAC19G12.06c PMID:20661445 MOD:00696 S127 to S128
08562810fce65760: changing SPCC622.08c PMID:20661445 MOD:00696 S128 to S129
28333b01f58bc586: changing SPBC4F6.12 PMID:34133210 MOD:00696 S3, S24, S31, T55, T64, S67, S97, S136, T214 to S3,S24,S31,T55,T64,S67,S97,S136,T214
521475f7c063d784: changing SPBC16G5.15c PMID:18235227 MOD:00046 S321A to S321
5339c3839d6a7634: changing SPAC1834.04 PMID:20299449 MOD:00723 K4 to K5
5339c3839d6a7634: changing SPAC1834.04 PMID:20299449 MOD:00723 K4 to K5
5339c3839d6a7634: changing SPAC1834.04 PMID:20299449 MOD:00723 K9 to K10
5339c3839d6a7634: changing SPAC1834.04 PMID:20299449 MOD:00723 K56 to K57
536dc2e074eee139: changing SPCC4E9.01c PMID:25993311 MOD:00696 S10|S22|S43|S150|S439|S496 to S10,S22,S43,S150,S439,S496
536dc2e074eee139: changing SPCC4E9.01c PMID:25993311 MOD:00696 T60|T70 to T60,T70
767451d8f8ef6abe: changing SPAC6G9.08 PMID:21182284 MOD:00046 S129 to S130
767451d8f8ef6abe: changing SPAC6G9.08 PMID:21182284 MOD:00046 S133 to S134
767451d8f8ef6abe: changing SPAC6G9.08 PMID:21182284 MOD:00046 S359 to S360
767451d8f8ef6abe: changing SPAC6G9.08 PMID:21182284 MOD:00047 T143 to T144
7bf1fc1e6f06a613: changing SPCC338.08 PMID:33836577 MOD:00047 T89, T154, T155 to T89,T154,T155
7bf1fc1e6f06a613: changing SPCC338.08 PMID:33836577 MOD:00046 S77, S151 to S77,S151
884c35ae47e3fec8: changing SPBC1A4.03c PMID:30635402 MOD:00046 S1363, S1364 to S1363,S1364
99f58cdf989ca814: changing SPCC622.08c PMID:19965387 MOD:00046 S121 to S122
99f58cdf989ca814: changing SPAC19G12.06c PMID:19965387 MOD:00046 S121 to S122
9b5edbe6f0efcb45: changing SPAC1834.04 PMID:17369611 MOD:00723 K56 to K57
9b5edbe6f0efcb45: changing SPBC8D2.04 PMID:17369611 MOD:00723 K56 to K57
9b5edbe6f0efcb45: changing SPBC1105.11c PMID:17369611 MOD:00723 K56 to K57
9d9a265db15a87cd: changing SPBP23A10.10 PMID:27191590 MOD:00696 S630, S632 to S630,S632
a09af17a2956146d: changing SPAC1834.04 PMID:31468675 MOD:01148 K14 to K15
a09af17a2956146d: changing SPBC8D2.04 PMID:31468675 MOD:01148 K14 to K15
a09af17a2956146d: changing SPBC1105.11c PMID:31468675 MOD:01148 K14 to K15
b2ae716b0ad7c3cb: changing SPCC338.17c PMID:28438891 MOD:00046 S163, S164,S165, S174, S209, S216, S219,S223, S226, S444, S507, S544, S545, S553 to S163,S164,S165,S174,S209,S216,S219,S223,S226,S444,S507,S544,S545,S553
c0af69aa51ff9eff: changing SPAC1834.04 PMID:11792803 MOD:00046 S10 to S11
c0af69aa51ff9eff: changing SPBC8D2.04 PMID:11792803 MOD:00046 S10 to S11
c0af69aa51ff9eff: changing SPBC1105.11c PMID:11792803 MOD:00046 S10 to S11
d62844597282017d: changing SPBC14C8.07c PMID:9353247 MOD:00047 T10A to T10
d62844597282017d: changing SPBC14C8.07c PMID:9353247 MOD:00047 T46A to T46
d62844597282017d: changing SPBC14C8.07c PMID:9353247 MOD:00047 T60A to T60
d62844597282017d: changing SPBC14C8.07c PMID:9353247 MOD:00047 T104A to T104
d62844597282017d: changing SPBC14C8.07c PMID:9353247 MOD:00047 T134A to T134
d62844597282017d: changing SPBC14C8.07c PMID:9353247 MOD:00047 T374A to T374
db3533d819cff33d: changing SPCC162.07 PMID:23297348 MOD:00046 S220 to S216
dd0b314b0bd84119: changing SPCC962.02c PMID:20739936 MOD:00696 S202A to S202
dd0b314b0bd84119: changing SPCC962.02c PMID:20739936 MOD:00696 S229A to S229
dd0b314b0bd84119: changing SPCC962.02c PMID:20739936 MOD:00696 S244A to S244
dd0b314b0bd84119: changing SPCC962.02c PMID:20739936 MOD:00696 S278A to S278
dd0b314b0bd84119: changing SPCC962.02c PMID:20739936 MOD:00696 S294A to S294
dd0b314b0bd84119: changing SPCC962.02c PMID:20739936 MOD:00696 T393A to T393
dd0b314b0bd84119: changing SPCC962.02c PMID:20739936 MOD:00696 T831A to T831
dd0b314b0bd84119: changing SPCC962.02c PMID:20739936 MOD:00696 T908A to T908
e68d23abf86a3c7c: changing SPAC17G8.10c PMID:34674264 MOD:00046 S4, S20, S166, S251, S266 to S4,S20,S166,S251,S266
e865b65eeb6f06b0: changing SPAC1834.04 PMID:29136238 MOD:00696 Y41 to Y42
e865b65eeb6f06b0: changing SPBC8D2.04 PMID:29136238 MOD:00696 Y41 to Y42
e865b65eeb6f06b0: changing SPBC1105.11c PMID:29136238 MOD:00696 Y41 to Y42
f30149c5fcc7f553: changing SPAC17G8.10c PMID:29975113 MOD:01148 K3, K26, K54, K82, K124, K164, K174, K237, K262 to K3,K26,K54,K82,K124,K164,K174,K237,K262
f45b7c9c20201a38: changing SPAC20H4.06c PMID:36361590 MOD:00046 S239, S308, S312 to S239,S308,S312
f45b7c9c20201a38: changing SPCC188.11 PMID:36361590 MOD:00046 S228, S236 to S228,S236
f45b7c9c20201a38: changing SPAC4D7.03 PMID:36361590 MOD:00047 T657, T666, T669 to T657,T666,T669
f7e6c33889ea1fa0: changing SPAC11E3.03 PMID:20935472 MOD:00696 S47 to S87
fe6e8e353ea78411: changing SPAP8A3.08 PMID:10364209 MOD:00046 S2A to S2
fe6e8e353ea78411: changing SPAP8A3.08 PMID:10364209 MOD:00046 S6A to S6
I'll apply the fixes to the modifications in SVN next.
That's done too now. I'll check Chado after tomorrow's load.
Edit - these changes were made:
skipping change where new position is unknown: SPAC57A7.12 MOD:00046 S12->? PMID:21712547
external_data/modification_files/PMID_21712547_modifications.tsv: changing SPAC57A7.12 MOD:00046 S500 to S484
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S200 to S184
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S202 to S186
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S212 to S196
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S224 to S208
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S229 to S213
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S232 to S216
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S237 to S221
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S239 to S223
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S309 to S293
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S316 to S300
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S337 to S321
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S345 to S329
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S354 to S338
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S376 to S360
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S381 to S365
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S383 to S367
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S409 to S393
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S411 to S395
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S419 to S403
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S426 to S410
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S445 to S429
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S447 to S431
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00046 S455 to S439
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00047 T129 to T113
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00047 T257 to T241
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00047 T352 to T336
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00047 T375 to T359
external_data/modification_files/PMID_29996109_modifications.tsv: changing SPBC25B2.07c MOD:00047 T439 to T423
external_data/modification_files/PMID_30726745_modifications.tsv: changing SPAC3H1.05 MOD:00046 S440 to S410
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S202 to S186
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S212 to S196
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S239 to S223
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S309 to S293
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S337 to S321
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S345 to S329
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S354 to S338
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S381 to S365
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S383 to S367
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00046 S434 to S418
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00047 T129 to T113
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00047 T257 to T241
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00047 T352 to T336
external_data/modification_files/PMID_33823663_modifications.tsv: changing SPBC25B2.07c MOD:00047 T375 to T359
Hi @kimrutherford, it seems like most of them went through, except for a few. The one with the "?" (expected), but also some histone_fix ones.
https://github.com/pombase/allele_qc/blob/master/results/protein_modification_auto_fix.tsv
systematic_id primary_name modification evidence sequence_position annotation_extension reference taxon date sequence_error change_sequence_position_to auto_fix_comment solution_index
SPAC1834.03c hhf1 MOD:00663 Inferred from Sequence or Structural Similarity K20 PB_REF:0000001 4896 2010-03-11 K20 K21 histone_fix
SPAC1834.04 hht1 MOD:00663 K14 present_during(GO:0031508) PMID:14561399 4896 2010-03-11 K14 K15 histone_fix
SPAC1834.04 hht1 MOD:00663 Inferred from Sequence or Structural Similarity K4 PB_REF:0000001 4896 2010-03-11 K4 K5 histone_fix
SPAC1834.04 hht1 MOD:00663 K9 present_during(GO:0031508) PMID:14561399 4896 2010-03-11 K9 K10 histone_fix
SPAC57A7.12 ssz1 MOD:00046 experimental evidence S12 present_during(GO:0000087) PMID:21712547 4896 2011-06-28 S12 ? old_coords_fix, revision 8148: complement(join(1515089..1516663,1516789..1516914))
SPBC1105.11c hht3 MOD:00427 Inferred from Sequence or Structural Similarity K4 PB_REF:0000001 4896 2010-03-11 K4 K5 histone_fix
SPBC1105.11c hht3 MOD:00427 Inferred from Sequence or Structural Similarity K9 PB_REF:0000001 4896 2010-03-11 K9 K10 histone_fix
SPBC1105.12 hhf3 MOD:00427 Inferred from Sequence or Structural Similarity K20 PB_REF:0000001 4896 2010-03-11 K20 K21 histone_fix
SPBC8D2.03c hhf2 MOD:00427 Inferred from Sequence or Structural Similarity K20 PB_REF:0000001 4896 2010-03-11 K20 K21 histone_fix
SPBC8D2.04 hht2 MOD:00427 Inferred from Sequence or Structural Similarity K4 PB_REF:0000001 4896 2010-03-11 K4 K5 histone_fix
SPBC8D2.04 hht2 MOD:00427 Inferred from Sequence or Structural Similarity K9 PB_REF:0000001 4896 2010-03-11 K9 K10 histone_fix
SPCC622.09 htb1 MOD:01148 Inferred from Direct Assay K119 PMID:17374714 4896 2007-07-16 K119 K120 histone_fix
I think I see why:
PB_REF:0000001
as a reference (don't know what that is)evidence
valueSPCC622.09 htb1 MOD:01148 Inferred from Direct Assay K119 PMID:17374714 4896 2007-07-16 K119 K120 histone_fix
Hi @Kimrutherford, as I said today, some new alleles have appeared in the allele list that did not exist before, for instance
SPAC13C5.03 D543->stop tht1 tht1-D543* nonsense mutation PMID:9442101
The reason why it did not appear before is because this allele has no annotations in canto, and was dropped and not ran through the previous pipeline. Not sure how we want to handle that, maybe you can filter that list before exporting it. I am pretty sure there is a lot of garbage on alleles without annotations.
Also, the misterious unfixed modification K119
might be related to the ones mentioned in https://github.com/pombase/allele_qc/issues/83 ?
Not sure how we want to handle that, maybe you can filter that list before exporting it. I am pretty sure there is a lot of garbage on alleles without annotations.
Hi Manu. The Canto allele export file has an "annotation_count" column. Could you ignore alleles where that column is zero?
The Canto allele export file has an "anno
Yes, I can use that to filter them out.
Hi @kimrutherford, I did this in https://github.com/pombase/allele_qc/commit/3ef4c14acd29ccfed893cdf5a39073c1ebe31f38
I am not just removing the alleles that have zero annotations in the canto file, in case there would be a case in which there is an allele in Canto without annotations, but with annotations in the PHAF files. See below to check that it makes sense
https://github.com/pombase/allele_qc/blob/master/filter_alleles_pombase.py
I think we can close this one
Hi @kimruterford, just summarising what we discussed in the call today. Similar files are generated by the pipeline for protein modifications as for alleles, and can be used to fix them in the PHAF files and in canto.
They are in this folder: https://github.com/pombase/allele_qc/tree/master/results
How to use the proposed fixes
For the fixing, the unique identifier of a fix is
systematic_id
,sequence_position
,reference
(probably reference can be omitted, but just in case let's use it because that's what the script uses as unique identifier).Important exceptions
Important for when you write the script that applies the changes from
protein_modification_auto_fix.tsv
, there is a column in the filesolution_index
, explained in https://github.com/pombase/canto/issues/2689#issue-1563394806. If this column has a value, it means that the pipeline found two possible solutions, and a decision has to be made.TLDR: If there is a value in
solution_index
, do not apply the fix.Another special case to take into account is decribed in #62. It can happen that someone has reported a modification on a residue that no longer exists in the currect gene structure (probably assigned with a high-throughput pipeline). For those cases, I have set the value of the column
change_sequence_position_to
to?
. Right now we only have an example:But more are likely to happen in the future. These can either be deleted, or kept knowing that they have a sequence error.
Related to #63