pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

Display diploid genotypes on the website #1328

Closed kimrutherford closed 3 years ago

kimrutherford commented 5 years ago

... once we have some in Chado (pombase/pombase-chado#734).

I think this needs a short discussion.

This might be related: #1327

ValWood commented 4 years ago

How do they look right now?

kimrutherford commented 4 years ago

How do they look right now?

Like normal multi-allele genotypes. The diploids are kind-of flattened out.

ValWood commented 3 years ago

We should discuss this. Do we currently display diploids at all? (I can't remember). We have quite a few now I think? I was just curating some now, which made me wonder.

@kimrutherford Could you let us know how many are curated. This will give us an idea about how we should prioritise.

kimrutherford commented 3 years ago

Do we currently display diploids at all?

I haven't worked out a good way to store the diploids in Chado: pombase/pombase-chado#734

I need to get back to that.

Could you let us know how many are curated. This will give us an idea about how we should prioritise.

There are 554 diploids spread over 104 sessions.

kimrutherford commented 3 years ago

I haven't worked out a good way to store the diploids in Chado: pombase/pombase-chado#734

Sorry I've double checked and I had this wrong. I thought we weren't displaying diploids at all. Actually we're showing them but in a naff way.

We're showing the two alleles of the diploid but in a way that makes it look like it's a genotype with two loci. Here's an example: https://curation.pombase.org/pombe/curs/1cfea3361eac50c/ro/ https://www.pombase.org/reference/PMID:2792737

I was right that I haven't thought of a good way to store the diploids in Chado.

Once Chado and the web code can cope with diploids how should the genotype page look when it's a diploid genotype? Currently it looks like: https://www.pombase.org/genotype/map1-mutant-unknown-unknown-expression-not_assayed__map1-mutant-unknown-unknown-expression-not_assayed which isn't great.

kimrutherford commented 3 years ago

FlyBase store genotypes and phenotype in a very different way to us that would make other things more difficult.

ValWood commented 3 years ago

Can it be something very simple like a flag? So we would just have a diploid flag on a single or multi allele to mark them to be displayed in a separate section. Or is it much more complicated?

kimrutherford commented 3 years ago

Can it be something very simple like a flag?

I think that's how we'll have to do it. Maybe a property on each allele that is part of a diploid so we know it's part of a diploid. I'll sleep on it and see what I can do tomorrow.

ValWood commented 3 years ago

It's not a huge hurry, since they are displayed people can see them and will know what they mean even without their own section.

One simple improvement would be to make this slightly more obvious at first glance by using the standard diploid notation

map1-mutant (unknown) / map1-mutant (unknown) instead of

map1-mutant (unknown) map1-mutant (unknown) (which could be interpreted as 2 copies of a mutant allele in a haploid, especially for heterozygotes)

kimrutherford commented 3 years ago

One simple improvement would be to make this slightly more obvious at first glance by using the standard diploid notation

Unfortunately that's not possible yet. The website code can't tell which alleles are part of diploids because that information isn't stored in Chado. The web code just knows which alleles are part of which genotype. We need to store the diploid details in Chado first.

kimrutherford commented 3 years ago

Maybe a property on each allele that is part of a diploid so we know it's part of a diploid.

Now I've looked at this properly I remember the fiddly bit. Each allele can be part of multiple genotypes and could be a diploid in some case but not others. That can be represented in Chado but as I say it's a bit fiddly.

ValWood commented 3 years ago

So it isn't possible to just add a diploid flag to diploid genotypes? or is that over-simplistic?

kimrutherford commented 3 years ago

So it isn't possible to just add a diploid flag to diploid genotypes?

That would work if all the diploid genotypes had a single locus. If there are 2 or more loci we need to record in Chado which allele is part of which locus. I think that's possible. I'm having a think about how to implement it in the least hacky way.

ValWood commented 3 years ago

Yes - sometimes they are multi allele diploids.

kimrutherford commented 3 years ago

Should we rename the "Single allele phenotype" section of the gene pages to "Single locus phenotype"?

kimrutherford commented 3 years ago

Or should all diploid genotypes go into the "Multi-allele phenotype" section?

ValWood commented 3 years ago

Should we rename the "Single allele phenotype" section of the gene pages to "Single locus phenotype"?

I think this makes most sense. At least, I would prefer to see single locus diploids in the same table as single alleles.

@mah11 ?

kimrutherford commented 3 years ago

We need to decide how to format the allele table at the top of genotype pages when the genotype is diploid.

Here's an easy example from: https://curation.pombase.org/pombe/curs/1534156512e84afb/ro/

ecl1Δ / ecl1Δ
ecl2Δ / ecl2Δ
ecl3Δ / ecl3Δ

Currently displayed as: https://www.pombase.org/genotype/ecl1delta__ecl1delta__ecl2delta__ecl2delta__ecl3delta__ecl3delta but there's an obvious we to format it nicely.

This is a trickier one from: https://curation.pombase.org/pombe/curs/1ebe4a1548de4aff/feature/genotype/view/14/ro

https://www.pombase.org/genotype/dot2-439-Q33-%3Estop-nonsense_mutation-expression-not_assayed__dot2-439-Q33-%3Estop-nonsense_mutation-expression-not_assayed__nmt81-pcp1-nmt81-pcp1%2B-other-expression-knockdown__nmt81-pcp1-nmt81-pcp1%2B-other-expression-knockdown

ValWood commented 3 years ago

This one could be:

ecl1Δ /ecl1Δ, ecl2Δ/ecl2Δ ,ecl3Δ /ecl3Δ

Gene Allele name Description Type Expression
ecl1 ecl1Δ   deletion Null
ecl2 ecl2Δ   deletion Null
ecl3 ecl3Δ   deletion Null

Perhaps - no need to repeat the dupicated alleles in the list?

For the second example:

dot2-439(Q33->stop)/dot2-439(Q33->stop), nmt81-pcp1(nmt81-pcp1+)[Knockdown]/nmt81-pcp1(nmt81-pcp1+)[Knockdown]

(so if there is line wrapping, split on the diploid allelle pair )

kimrutherford commented 3 years ago

Do we need to display the type ("nonsense_mutation"/"other")?

mah11 commented 3 years ago

on annotation table names:

Should we rename the "Single allele phenotype" section of the gene pages to "Single locus phenotype"?

OK

Or should all diploid genotypes go into the "Multi-allele phenotype" section?

No; I'd prefer to rename the "multi" table to "Multi-locus phenotypes".

I think it's sensible to split the display into two tables (single- & multi-locus, each with haploids and diploids included). It would not make sense to have a table just for single-locus haploids unless it was one of four tables (a separate one for each combinatorial possibility) and I don't see any enthusiasm for that.

mah11 commented 3 years ago

genotype page displays:

ecl1Δ/ecl1Δ, ecl2Δ/ecl2Δ, ecl3Δ/ecl3Δ dot2-439(Q33->stop)/dot2-439(Q33->stop), nmt81-pcp1(nmt81-pcp1+)[Knockdown]/nmt81-pcp1(nmt81-pcp1+)[Knockdown]

Yes, the header should include the / between alleles of each locus (without spaces).

Perhaps - no need to repeat the duplicated alleles in the list?

I agree, it looks a bit silly, especially since the table doesn't group the two alleles of any locus visually. For heterozygous diploids there'll still be two entries because the alleles will be different. As long as the table is just meant to be a list of the alleles present in the genotype that'll do. (I'm thinking about whether it's worth explicitly saying "homozygous" or "heterozygous" somewhere but I don't think it's hugely important.)

Do we need to display the type ("nonsense_mutation"/"other")?

In the header, no. In the table, yes -- we should show it for info, and because we include it on the pages for haploid genotypes. Consistency, ya know.

kimrutherford commented 3 years ago

Could one of you make a mock-up of how the table should look in the diploid case? Thanks.

kimrutherford commented 3 years ago

The change to the genotype display names is now live. eg. https://www.pombase.org/genotype/nmt81-pcp1-nmt81-pcp1%2B-other-expression-knockdown_nmt81-pcp1-nmt81-pcp1%2B-other-expression-knockdown__dot2-439-Q33-%3Estop-nonsense_mutation-expression-not_assayed_dot2-439-Q33-%3Estop-nonsense_mutation-expression-not_assayed

I need to tweak the URLs for the genotype pages so if you get a 404 error from that link try from the publication page: https://www.pombase.org/reference/PMID:15992541

For now all the diploid genotype go into the "Multi-allele phenotype" sections.

kimrutherford commented 3 years ago

Here are some more examples: https://www.pombase.org/reference/PMID:19911044

https://www.pombase.org/genotype/mre11-D65N-D65N-amino_acid_mutation-expression-not_assayed_mre11-D65N-D65N-amino_acid_mutation-expression-not_assayed__mek1delta_mek1delta__rad50S-K81I-amino_acid_mutation-expression-not_assayed_rad50S-K81I-amino_acid_mutation-expression-not_assayed

mah11 commented 3 years ago

Could one of you make a mock-up of how the table should look in the diploid case?

Eurgh, here's a go, using an example based on this one, with a change to include a heterozygous locus in the mock-up...

One other idea: maybe leave some details out of the header? e.g.

mek1Δ/mek1Δ mre11-D65N/mre11-D65N rad50S/rad50+

or, with everything (as they currently look), and including a table:

mek1Δ/mek1Δ mre11-D65N(aa)/mre11-D65N(aa) rad50S(K81I aa)/rad50+


Diploid genotype

Locus Allele name Description Type Expression
mek1 mek1Δ deletion Null
mek1Δ deletion Null
mre11 mre11-D65N D65N aa amino_acid_mutation Not assayed
mre11-D65N D65N aa amino_acid_mutation Not assayed
rad50 rad50S K81I aa amino_acid_mutation Not assayed
rad50+ wild_type Wild type product level
mah11 commented 3 years ago

another take on the table, done in a word processor & saved as .png:

diploid_table_mockup1

to group the alleles of one locus more clearly than I could do in markdown

mah11 commented 3 years ago

It seems less bad to show two copies of the same allele if it's clear that they're included as the two copies of one locus.

ValWood commented 3 years ago

Yes, the shading helps.

kimrutherford commented 3 years ago

Thanks Midori. I'll have a go at that.

Just to be clear, should we keep the table the same as now in the haploid case?

mah11 commented 3 years ago

should we keep the table the same as now in the haploid case?

I suppose so ... it does the trick and nobody complains about it.

kimrutherford commented 3 years ago

another take on the table, done in a word processor & saved as .png:

I've done that. The change will be in the main site in the morning. I haven't changed the header yet.

diploid-allele-table-1

kimrutherford commented 3 years ago

The change will be in the main site in the morning.

It's here for now: http://dev.pombase.kmr.nz/genotype/nmt81-pcp1-nmt81-pcp1%2B-other-expression-knockdown_nmt81-pcp1-nmt81-pcp1%2B-other-expression-knockdown__dot2-439-Q33-%3Estop-nonsense_mutation-expression-not_assayed_dot2-439-Q33-%3Estop-nonsense_mutation-expression-not_assayed

I haven't updated the main site because I want to hear your opinion about: https://github.com/pombase/pombase-chado/issues/824#issuecomment-810759099

ValWood commented 3 years ago

This looks good to me.

I notice on the gene page that single diploids are included in multi allele table. Do we plan to move these into and rename as 'single locus table'? Also, dot1 might be a good example to use for the announcement when it happens, but we should review the existing genotypes to check whether some should be diploids first. I just swapped over https://www.pombase.org/reference/PMID:11901107 to heterozygous diploids.

kimrutherford commented 3 years ago

I notice on the gene page that single diploids are included in multi allele table. Do we plan to move these into and rename as 'single locus table'?

We can do that. I think we're all agreed: https://github.com/pombase/website/issues/1328#issuecomment-808125933

Should I go ahead?

ValWood commented 3 years ago

We can do that. I think we're all agreed: #1328 (comment) Should I go ahead?

I thought we had agreed. I just couldn't remember which parts applied to the website and which to the curation tool ;)

I don't see why not...

mah11 commented 3 years ago

another take on the table, done in a word processor & saved as .png:

I've done that.

That looks nice!

I thought we had agreed. I just couldn't remember which parts applied to the website and which to the curation tool ;)

Website agreed in this very ticket, e.g. this comment; Canto here.

kimrutherford commented 3 years ago

I don't see why not...

It will appear as "Single locus phenotype" / "Multi-locus phenotype" everywhere, including the left hand menu and in the query builder. We'll need to update the documentation to match.

Is that all OK?

mah11 commented 3 years ago

Yeah, I'll cope. I'll put the documentation on my post-break to-do list.

kimrutherford commented 3 years ago

I'll put the documentation on my post-break to-do list.

Great. Thanks.

I've made all the necessary changes locally. I'll hold off deploying things until you have a chance to change the documentation. There's no rush at all because next week I need to get back to working on Canto.

kimrutherford commented 3 years ago

After this is done, does it make sense to add a query builder option to constrain phenotype searches by diploid/haploid/either? We can already constrain by single/mutli allele (although that's about to change to single/multi locus).

ValWood commented 3 years ago

I guess it could be useful. At that point we could announce the ability to curate diploids properly. That we display then, and that they can be queried. Also ask people to let us know if they spot any diploid alleles curated as haploid..

kimrutherford commented 3 years ago

I guess it could be useful.

OK, I'll add that at the same time as changing all the sections to single/multi locus.

mah11 commented 3 years ago

Let me know when it's done so I can get screenshots for the documentation :)

kimrutherford commented 3 years ago

This should be ready for testing on Monday. I'll release the changes in the dev site and delay updating of the main site until we're happy with the changes and the documentation is OK.

If the "Diploid" box isn't ticked should we label the check boxes as "Single allele"/"Multi-allele"? Or is it more consistent to stick to "Single locus"/"Multi-locus"?

Also should we change the "Expression level:" label to "Expression level for single allele haploid:" to be more explicit?

query-builder-ploid-2

kimrutherford commented 3 years ago

The dev site is updated to have the single/multi locus spilt. Let me know if you spot any problems. I won't update the main site with these changes until you're happy with things and until the documentation is ready.

mah11 commented 3 years ago

Search stuff ...

Big picture: looks good, and seems to be working.

I don't love that the hyphen is missing from "multi-locus" in the query description (in the history; it's been fixed elsewhere), but I guess that's me being finicky and pedantic.

If the "Diploid" box isn't ticked should we label the check boxes as "Single allele"/"Multi-allele"? Or is it more consistent to stick to "Single locus"/"Multi-locus"?

I don't have strong feelings, but I'm a bit inclined towards leaving it as is for consistency.

Also should we change the "Expression level:" label to "Expression level for single allele haploid:" to be more explicit?

Can't hurt.

mah11 commented 3 years ago

Page display stuff ...

The main thing I've spotted is that haploid and diploid genotypes are interspersed in the phenotype annotation tables. Is that intentional? I thought we envisioned having all of the haploids and then all of the diploids in each table, and it'd be good to do that before release if it's not a huge amount of work.

The diploid genotype pages look good. :)

kimrutherford commented 3 years ago

I don't love that the hyphen is missing from "multi-locus" in the query description

That's fixed now.

I thought we envisioned having all of the haploids and then all of the diploids in each table,

I don't remember talking about that. I'll have a go tomorrow.

mah11 commented 3 years ago

I thought we envisioned having all of the haploids and then all of the diploids in each table,

I don't remember talking about that. I'll have a go tomorrow.

It's entirely possible that we only discussed in it the context of Canto, or even that I've been thinking it without saying or typing it until now. If the latter, oops, sorry. It'd still be good to be able to spot diploids easily, so if grouping them within the tables turns out to be a pain I'll open a ticket for a filter (hmm, might do that anyway).