Allow retrieval of pombe genes based on cerevisiae and human gene lists

ValWood commented 4 years ago

Since we can already search on SGD locus and HGNC IDs, how easy would it be to be able to add them in the list upload (clearly would need to toggle to say which species because of name overlaps).

(Have we discussed this before?)

kimrutherford commented 4 years ago

It wouldn't be too much work but I think the menu on the advanced search page is getting too long. Could we change the Uniprot accessions tab to be "External identifiers" or something? Then allow UniProt, human and SGD IDs on that page?

ValWood commented 4 years ago

I forgot we had the UniPRot accessions tab! Yes we could make people select the appropriate 'type' from the box

Or we could do all from here

default pombe than have toggles for UniProt, HGNC and S.c locus?

Uniprot will only work for pombe so that should be clear too?

ValWood commented 4 years ago

Also, we can't really do this within the query builder, because of the many to on, none many mappings.

People would really need to know what mapped to what before proceeding to load the output into the query builder.

More thinking required!

kimrutherford commented 4 years ago

People would really need to know what mapped to what before proceeding to load the output into the query builder.

Good point. Perhaps we need a separate ID mapping tool. It could have a link to the query builder once the user has the pombe IDs they're after.

ValWood commented 4 years ago

Yes OK this is a 'longer term' project ;)

kimrutherford commented 4 years ago

While we're talking about this, do you think we should change "Gene IDs" to "Gene names and IDs to be explicit about where to upload gene names?

ValWood commented 4 years ago

This would (probably) be better

we probably put IDs because IDs are preferred as more stable and unambiguous

kimrutherford commented 4 years ago

It's now "Gene name and IDs". It's easy to change back if we don't like it.

kimrutherford commented 4 years ago

For looking up using orthologs we need to decide what to do if a non-pombe ID matches more than one pombe gene. Should we keep all matching genes in that case, or allow the user to select the genes to keep?

To see how InterMine handles these cases, follow this link: https://www.flymine.org/flymine/bag.do?subtab=upload then click "(click to see an example)" and then "Create list"

ValWood commented 4 years ago

This is a bit different from the intermine situation, I think, when people may be using synonyms etc.

I think in this case users will want both paralogs. But they may be confused that their output list is a different length form the input list.

We should provide a message "the following genes in your list had mapped to multip;e genes (paralogs) in fission yeast. All paralogs were included) In the place where you usually get a message if identifiers are not matched.

Then the user can, if necessary delete these, (but I can't think of a reason that they would only want to select one of the fission yeast copies, so this would be a reasonable default behaviour)

If the user has duplicates in their input list for some reason, only the official name would be recognised so they would get the 'not found' message for one of their IDs...

kimrutherford commented 4 years ago

This is a bit different from the intermine situation, I think, when people may be using synonyms etc.

Good point.

We should provide a message "the following genes in your list had mapped to multip;e genes (paralogs) in fission yeast. All paralogs were included) In the place where you usually get a message if identifiers are not matched.

That makes sense.

I've done a bit of work on this and it won't take too much more to finish.

kimrutherford commented 4 years ago

I've done a bit of work on this and it won't take too much more to finish.

There is a prototype here: https://www.pombase.org/identifier-mapper

It definitely needs work:

[ ] improve the look of things overall
[ ] improve the labels and headers
[ ] a short blurb at the top
[ ] fix the browser "back" button to do the right thing
[ ] documentation

But it's a start.

ValWood commented 4 years ago

nice start.

Here is a test list that includes 1:1 1:many many:1 many to many.

YPR191W YPR121W YPL258C YPR165W YOL055C YPR141C YPR159W YBR104W YPR058W

Maybe we want to have 4 sections? to make it clear? and how to deal with redundancy. We can chat about this on a call if you like.

kimrutherford commented 4 years ago

many to many.

I'm not sure that's needed. Maybe 1:many and many:1 would be clear enough?

ValWood commented 4 years ago

yes, you are right.

kimrutherford commented 4 years ago

I did a little bit on this today as a break from Canto.

Maybe 1:many and many:1 would be clear enough?

I've added a many-to-one section but I'll need help with the wording:

id-mapper-2

ValWood commented 4 years ago

Looks good to me!

kimrutherford commented 3 years ago

Val spotted that the "Lookup" button remains disabled if you upload an ID list from a file:

https://github.com/pombase/curation/issues/2905#issuecomment-747984756

kimrutherford commented 3 years ago

Val spotted that the "Lookup" button remains disabled if you upload an ID list from a file:

That's fixed now.

kimrutherford commented 3 years ago

Still to do:

[ ] improve the look of things overall
[ ] improve the labels and headers
[ ] a short blurb at the top
[ ] documentation
[x] fix the browser "back" button to do the right thing

I'm working on the "back" button problem. I need help (in January) with the other items.

kimrutherford commented 3 years ago

I forgot to say, it's live here but there is no link to it: https://www.pombase.org/identifier-mapper

ValWood commented 3 years ago

OK, all seems to be working fine.

kimrutherford commented 3 years ago

fix the browser "back" button to do the right thing

That's done and works quite smoothly now: https://www.pombase.org/identifier-mapper

I've also changed the link to the advanced search to be a button to make it more obvious.

improve the look of things overall

I was wondering if it would be better to not show the matches initially or to have the three result types (1-1, 1-many, many-1) in different tabs. I'm worried that in the current implementation, users will not scroll down and see all three sections.

Here's what it looks like if we only show headings and counts:

Screenshot from 2020-12-30 11-16-55

kimrutherford commented 3 years ago

From pombase/curation#2905:

Is this ID form acceptable? HGNC:10001 (should specify in the drop down)

I've done that. The drop down is quite wide now though. Let me know if you'd like it changed.

id-type-1

ValWood commented 3 years ago

I think it is OK, best to be explicit...

kimrutherford commented 3 years ago

I was wondering if it would be better to not show the matches initially or to have the three result types (1-1, 1-many, many-1) in different tabs. I'm worried that in the current implementation, users will not scroll down and see all three sections.

What do you think about this change?

Here's an example of how it could look:

Screenshot from 2020-12-30 11-16-55

mah11 commented 3 years ago

I don't have a strong preference. Just let me know when it's ready for documentation, and I'll write some.

ValWood commented 3 years ago

I agree this is clearer.

kimrutherford commented 3 years ago

I've changed the code to initially hide the results. It will be live in the morning.

If there is only a single result in the one-one or one-many section, it shows it straight away.

kimrutherford commented 3 years ago

Once the new ID mapper is in production the "UniProt accessions" tab on the a bit redundant. Should we remove it?

I suggest we add the ID mapper to the "Search" menu and we could also have a link to it from the default tab in the Advanced Search like this:

id-mapper-in-advanced-search-1

kimrutherford commented 3 years ago

From the Zoom call:

change the text for UniProt in the type selector to: "S. pombe UniProt accessions (eg. ...)"
add "Hide ..." or "Hide matches ..." to do the opposite of "Show matches ..."

Anything else?

mah11 commented 3 years ago

That's all I remember.

Tiny picky copy-editing thing: it's e.g. i.e. with dots after both e and g.

kimrutherford commented 3 years ago

From the Zoom call:

I've made those changes and changed to using "e.g."

I haven't made the changes in this screenshot yet: https://github.com/pombase/website/issues/1539#issuecomment-759837388 And I haven't changed the menu.

It's still only available via a direct link: https://www.pombase.org/identifier-mapper

Midori, I'll email next week so we can pick a day to release all the changes in the test tool for doing screenshots and documentation .

mah11 commented 3 years ago

Midori, I'll email next week so we can pick a day to release all the changes in the test tool for doing screenshots and documentation.

Sounds good. At the moment next week looks relatively sane, so any day will probably do.

kimrutherford commented 3 years ago

I'll email next week so we can pick a day to release all the changes in the test tool for doing screenshots and documentation.

Sorry Midori, I forgot about this last week.

I've just added two sentences at the top of the main ID mapper page as a placeholder: https://www.pombase.org/identifier-mapper

Would you be able to have a look and make any edits you think it needs? The file is here: https://github.com/pombase/website/blob/master/src/app/identifier-mapper/identifier-mapper.component.html#L7

Thanks!

And if there are any changes to make or text to add on the mapper results page, please let me know.

mah11 commented 3 years ago

Thanks - I committed a couple of small edits, and the tests passed. When that goes live I'll write up some documentation, and ping again when it's done.

kimrutherford commented 3 years ago

Thanks for the edits. That's now live: https://www.pombase.org/identifier-mapper

I haven't committed the changes to add links to the ID mapper, but it's ready to go. Here's what I've got. Please let me know what to edit. Rather than removing the "UniProt accessions" tab from the advanced search I though it would be better to add a link to the ID mapper for users that have used that tab before. Is that sensible? We could remove it add some point in the future.

id-mapper-0 id-mapper-1 id-mapper-2

mah11 commented 3 years ago

I like all of that :)

I'll make a start on the documentation.

kimrutherford commented 3 years ago

I like all of that :)

Great. Those changes are now in the dev site: http://dev.pombase.kmr.nz/

I'll make a start on the documentation.

Thanks!

mah11 commented 3 years ago

I have committed some documentation, and the checks look OK.

kimrutherford commented 3 years ago

Thanks! I've added a link to the documentation from the ID mapper. It's all now deployed on the main site.

I think the second screenshot on the documentation page might need some trimming. There is a bit of white space at the top and bottom.

ValWood commented 3 years ago

Great stuff. I'm sure this will be v. popular.

ValWood commented 3 years ago

Ooh one more thing, sorry! (can still be announced)

Could we make it clearer if all the genes mapped, maybe with a message at the top Input list of 12 human IDs, 12 IDs mapped to 11 pombe proteins, 0 not found (or something similar)

Because, with a list like this, where all IDs map, this is not completely obvious:

PEX12 RAD52 PSTPIP1 PSTPIP2 NSFL1C UBXN2B UBXN2A STX8 HLTF RAD1 MAP3K2 MAP3K3

ValWood commented 3 years ago

Once you introduce unmapped IDs this becomes more obvious CEP85L CEP104 CEP95

but it is difficult to figure out because the multi-multi does not have numbers, and users might not size of their input list in their head.

ValWood commented 3 years ago

Oh, hang on, I found a bug!

PSTPIP1 and PSTPIP2 appear in many-to-one and many-to-many

ValWood commented 3 years ago

we have one to one one to many many to one but not many to many

This might be a bit confusing because "many to many" appear in bot "one to many" and "many to one".

"Many to many" should be in a separate section (because they aren't 1:many or many:1) Sorry I didn't spot this before.

kimrutherford commented 3 years ago

Oh, hang on, I found a bug! PSTPIP1 and PSTPIP2 appear in many-to-one and many-to-many

That's working as I'd expect. They are both one-many and many-one.

"Many to many" should be in a separate section (because they aren't 1:many or many:1) Sorry I didn't spot this before.

We discussed many-many and decided against it: https://github.com/pombase/website/issues/1539#issuecomment-719159557

Should we add it?

mah11 commented 3 years ago

I think the second screenshot on the documentation page might need some trimming.

Thanks - I've committed an update that should crop off the extra.

ValWood commented 3 years ago

Should we add it?

Sorry I didn't think that through. The problem is that in orthology- jargon many-to-many is a separate class to many to one and one to many.

It is useful to see many to many separately because they tend to be different types of gene products. Many-to-one and one-to-many are often a single duplication in one species, but are otherwise often in single copy. For example pombe has 2 copies of the proteasome complex subunit rpn5 (rpn501 and rpn502) but all other species have one copy and all other proteaxome sununits are in one copy. These single copy duplicates pop up but are often removed by selection (although sometimes retained).

The many to many class either

Highly conserved copies of gene products that are needed in high copy number (mainly histones, ribosomal proteins) or
Proteins that are not necessarily conserved members of complexes, transcription factors, signalling molecules and transporters.

These aren't hard and fast rules, but there are definitely biological differences between these list. I can see that people might want only 1:1, 1:many and many:1 but might want to exclude many:many.

kimrutherford commented 3 years ago

I can see that people might want only 1:1, 1:many and many:1 but might want to exclude many:many.

Do we need an option to exclude many to many matches from the results?

pombase / website

Allow retrieval of pombe genes based on cerevisiae and human gene lists #1539