pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

Use the allele_qc api to suggest fixes #2690

Open manulera opened 1 year ago

manulera commented 1 year ago

Related to #2688

There are three API entry points that can be used to suggest fixes for errors

This works both for allele descriptions and protein modification coordinates, but they have to be comma-separated, so the request should contain the syntax-corrected descriptions. E.g. V123A, P124A will not work, it has to be V123A,P124A.

/old_coords_fix

Checks whether the coordinates provided match an old gene structure, and if so returns updated coordinates in the values field. It also returns a revision where the old coordinates were removed, as well as the coordinates themselves.

Example request:

{
  "systematic_id": "SPBC1706.01",
  "targets": "P170A,V223A,F225A"
}

Example response (a list of possible solutions that is empty if no solution exists):

[
  {
    "values": "P182A,V235A,F237A",
    "revision": "20110324",
    "location": "588765..591194"
  }
]

/histone_fix

If the systematic_id corresponds to a histone, check if increasing all indexes by 1 fixes the problem. This is because in histones often the first methionine is not counted.

Example request:

{
  "systematic_id": "SPAC1834.04",
  "targets": "K9A,K14R,K14A"
}

Example response (a list of possible solutions that is empty if no solution exists):

[
  {
    "values": "K10A,K15R,K15A"
  }
]

/multi_shift_fix

Check if increasing all indexes by a fixed amount fixes the problem. Only tries to find a solution if there are 3 or more sequence positions indicated.

Example request

{
  "systematic_id": "SPAPB1A10.09",
  "targets": "S123,A124,N125"
}

Example response (a list of possible solutions that is empty if no solution exists):

[
  {
    "values": "S372,A373,N374"
  },
  {
    "values": "S571,A572,N573"
  }
]

Priority

In the allele fixing pipeline the highest priority is old_coords fix, then histone, then multi_shift. So only the first one that gives a solution should be kept.

How to display the possible solution to the user

`
The sequence positions provided (${targets}) do not match the current gene structure, but match an old gene structure (position ${location} at revision ${revision}). If you think this old gene structure was used, please change to ${values}.
`
`
The sequence positions provided (${targets}) do not match the current gene structure. However, ${systematic_id} is a histone, and in histones often the first methionine is not counted. If you think this could be the case, please change to ${values}.
`
`
The sequence positions provided (${targets}) do not match the current gene structure. However, by shifted all indexes provided by a fix amount, we found that ${values} matches the gene structure. If you think the index might have been shifted, please change to ${values}.
`
manulera commented 1 year ago

hi @kimrutherford, I have made the allele name optional in the API as we discussed yesterday.

kimrutherford commented 1 year ago

I have made the allele name optional in the API as we discussed yesterday.

Thanks. I've changed Canto so that the "Check" button works even if the allele name is blank.

I'm still working on removing the Check button.

manulera commented 1 year ago

By the way, the allele types that can be checked are those that contain either "nucleotide" or "amino" in them.

ValWood commented 9 months ago

Is this ticket still required?

kimrutherford commented 9 months ago

Yep, I haven't implement this yet.

I'll work on this after finishing: