Closed ValWood closed 3 years ago
It wouldn't be too much work but I think the menu on the advanced search page is getting too long. Could we change the Uniprot accessions tab to be "External identifiers" or something? Then allow UniProt, human and SGD IDs on that page?
I forgot we had the UniPRot accessions tab! Yes we could make people select the appropriate 'type' from the box
Or we could do all from here
default pombe than have toggles for UniProt, HGNC and S.c locus?
Uniprot will only work for pombe so that should be clear too?
Also, we can't really do this within the query builder, because of the many to on, none many mappings.
People would really need to know what mapped to what before proceeding to load the output into the query builder.
More thinking required!
People would really need to know what mapped to what before proceeding to load the output into the query builder.
Good point. Perhaps we need a separate ID mapping tool. It could have a link to the query builder once the user has the pombe IDs they're after.
Yes OK this is a 'longer term' project ;)
While we're talking about this, do you think we should change "Gene IDs" to "Gene names and IDs to be explicit about where to upload gene names?
This would (probably) be better
we probably put IDs because IDs are preferred as more stable and unambiguous
It's now "Gene name and IDs". It's easy to change back if we don't like it.
For looking up using orthologs we need to decide what to do if a non-pombe ID matches more than one pombe gene. Should we keep all matching genes in that case, or allow the user to select the genes to keep?
To see how InterMine handles these cases, follow this link: https://www.flymine.org/flymine/bag.do?subtab=upload then click "(click to see an example)" and then "Create list"
This is a bit different from the intermine situation, I think, when people may be using synonyms etc.
I think in this case users will want both paralogs. But they may be confused that their output list is a different length form the input list.
We should provide a message "the following genes in your list had mapped to multip;e genes (paralogs) in fission yeast. All paralogs were included) In the place where you usually get a message if identifiers are not matched.
Then the user can, if necessary delete these, (but I can't think of a reason that they would only want to select one of the fission yeast copies, so this would be a reasonable default behaviour)
If the user has duplicates in their input list for some reason, only the official name would be recognised so they would get the 'not found' message for one of their IDs...
This is a bit different from the intermine situation, I think, when people may be using synonyms etc.
Good point.
We should provide a message "the following genes in your list had mapped to multip;e genes (paralogs) in fission yeast. All paralogs were included) In the place where you usually get a message if identifiers are not matched.
That makes sense.
I've done a bit of work on this and it won't take too much more to finish.
I've done a bit of work on this and it won't take too much more to finish.
There is a prototype here: https://www.pombase.org/identifier-mapper
It definitely needs work:
But it's a start.
nice start.
Here is a test list that includes 1:1 1:many many:1 many to many.
YPR191W YPR121W YPL258C YPR165W YOL055C YPR141C YPR159W YBR104W YPR058W
Maybe we want to have 4 sections? to make it clear? and how to deal with redundancy. We can chat about this on a call if you like.
many to many.
I'm not sure that's needed. Maybe 1:many and many:1 would be clear enough?
yes, you are right.
I did a little bit on this today as a break from Canto.
Maybe 1:many and many:1 would be clear enough?
I've added a many-to-one section but I'll need help with the wording:
Looks good to me!
Val spotted that the "Lookup" button remains disabled if you upload an ID list from a file:
https://github.com/pombase/curation/issues/2905#issuecomment-747984756
Val spotted that the "Lookup" button remains disabled if you upload an ID list from a file:
That's fixed now.
Still to do:
I'm working on the "back" button problem. I need help (in January) with the other items.
I forgot to say, it's live here but there is no link to it: https://www.pombase.org/identifier-mapper
OK, all seems to be working fine.
fix the browser "back" button to do the right thing
That's done and works quite smoothly now: https://www.pombase.org/identifier-mapper
I've also changed the link to the advanced search to be a button to make it more obvious.
improve the look of things overall
I was wondering if it would be better to not show the matches initially or to have the three result types (1-1, 1-many, many-1) in different tabs. I'm worried that in the current implementation, users will not scroll down and see all three sections.
Here's what it looks like if we only show headings and counts:
From pombase/curation#2905:
Is this ID form acceptable? HGNC:10001 (should specify in the drop down)
I've done that. The drop down is quite wide now though. Let me know if you'd like it changed.
I think it is OK, best to be explicit...
I was wondering if it would be better to not show the matches initially or to have the three result types (1-1, 1-many, many-1) in different tabs. I'm worried that in the current implementation, users will not scroll down and see all three sections.
What do you think about this change?
Here's an example of how it could look:
I don't have a strong preference. Just let me know when it's ready for documentation, and I'll write some.
I agree this is clearer.
I've changed the code to initially hide the results. It will be live in the morning.
If there is only a single result in the one-one or one-many section, it shows it straight away.
Once the new ID mapper is in production the "UniProt accessions" tab on the a bit redundant. Should we remove it?
I suggest we add the ID mapper to the "Search" menu and we could also have a link to it from the default tab in the Advanced Search like this:
From the Zoom call:
Anything else?
That's all I remember.
Tiny picky copy-editing thing: it's e.g.
i.e. with dots after both e and g.
From the Zoom call:
I've made those changes and changed to using "e.g."
I haven't made the changes in this screenshot yet: https://github.com/pombase/website/issues/1539#issuecomment-759837388 And I haven't changed the menu.
It's still only available via a direct link: https://www.pombase.org/identifier-mapper
Midori, I'll email next week so we can pick a day to release all the changes in the test tool for doing screenshots and documentation .
Midori, I'll email next week so we can pick a day to release all the changes in the test tool for doing screenshots and documentation.
Sounds good. At the moment next week looks relatively sane, so any day will probably do.
I'll email next week so we can pick a day to release all the changes in the test tool for doing screenshots and documentation.
Sorry Midori, I forgot about this last week.
I've just added two sentences at the top of the main ID mapper page as a placeholder: https://www.pombase.org/identifier-mapper
Would you be able to have a look and make any edits you think it needs? The file is here: https://github.com/pombase/website/blob/master/src/app/identifier-mapper/identifier-mapper.component.html#L7
Thanks!
And if there are any changes to make or text to add on the mapper results page, please let me know.
Thanks - I committed a couple of small edits, and the tests passed. When that goes live I'll write up some documentation, and ping again when it's done.
Thanks for the edits. That's now live: https://www.pombase.org/identifier-mapper
I haven't committed the changes to add links to the ID mapper, but it's ready to go. Here's what I've got. Please let me know what to edit. Rather than removing the "UniProt accessions" tab from the advanced search I though it would be better to add a link to the ID mapper for users that have used that tab before. Is that sensible? We could remove it add some point in the future.
I like all of that :)
I'll make a start on the documentation.
I like all of that :)
Great. Those changes are now in the dev site: http://dev.pombase.kmr.nz/
I'll make a start on the documentation.
Thanks!
I have committed some documentation, and the checks look OK.
Thanks! I've added a link to the documentation from the ID mapper. It's all now deployed on the main site.
I think the second screenshot on the documentation page might need some trimming. There is a bit of white space at the top and bottom.
Great stuff. I'm sure this will be v. popular.
Ooh one more thing, sorry! (can still be announced)
Could we make it clearer if all the genes mapped, maybe with a message at the top Input list of 12 human IDs, 12 IDs mapped to 11 pombe proteins, 0 not found (or something similar)
Because, with a list like this, where all IDs map, this is not completely obvious:
PEX12 RAD52 PSTPIP1 PSTPIP2 NSFL1C UBXN2B UBXN2A STX8 HLTF RAD1 MAP3K2 MAP3K3
Once you introduce unmapped IDs this becomes more obvious CEP85L CEP104 CEP95
but it is difficult to figure out because the multi-multi does not have numbers, and users might not size of their input list in their head.
Oh, hang on, I found a bug!
PSTPIP1 and PSTPIP2 appear in many-to-one and many-to-many
we have one to one one to many many to one but not many to many
This might be a bit confusing because "many to many" appear in bot "one to many" and "many to one".
"Many to many" should be in a separate section (because they aren't 1:many or many:1) Sorry I didn't spot this before.
Oh, hang on, I found a bug! PSTPIP1 and PSTPIP2 appear in many-to-one and many-to-many
That's working as I'd expect. They are both one-many and many-one.
"Many to many" should be in a separate section (because they aren't 1:many or many:1) Sorry I didn't spot this before.
We discussed many-many and decided against it: https://github.com/pombase/website/issues/1539#issuecomment-719159557
Should we add it?
I think the second screenshot on the documentation page might need some trimming.
Thanks - I've committed an update that should crop off the extra.
Should we add it?
Sorry I didn't think that through. The problem is that in orthology- jargon many-to-many is a separate class to many to one and one to many.
It is useful to see many to many separately because they tend to be different types of gene products. Many-to-one and one-to-many are often a single duplication in one species, but are otherwise often in single copy. For example pombe has 2 copies of the proteasome complex subunit rpn5 (rpn501 and rpn502) but all other species have one copy and all other proteaxome sununits are in one copy. These single copy duplicates pop up but are often removed by selection (although sometimes retained).
The many to many class either
These aren't hard and fast rules, but there are definitely biological differences between these list. I can see that people might want only 1:1, 1:many and many:1 but might want to exclude many:many.
I can see that people might want only 1:1, 1:many and many:1 but might want to exclude many:many.
Do we need an option to exclude many to many matches from the results?
Since we can already search on SGD locus and HGNC IDs, how easy would it be to be able to add them in the list upload (clearly would need to toggle to say which species because of name overlaps).
(Have we discussed this before?)