legumeinfo / ZZBrowse

Other
1 stars 1 forks source link

Display QTL data #18

Closed adf-ncgr closed 3 years ago

adf-ncgr commented 4 years ago

QTLs are typically represented as fairly wide intervals (typically, multi-megabases) but otherwise have much conceptual similarity to GWAS in that they represent a statistical association of a genomic region with some phenotypic trait. The methodologies for GWAS and QTL experiments/analyses are different, but this makes them somewhat complementary in nature and suggests we could benefit by integrating QTL data for display similar to what we currently do for GWAS only.

There may be some complications in terms of tabular displays (the column sets for GWAS and QTL are likely a bit different), but I am mostly thinking of the graphical displays at the moment. Something like an extra track that would display QTLs colored by traits as the GWAS SNPs are, but with range positions like genes have (no need for fwd/rev strand distinction for QTLs, but we'd want them to "tile" according to some simple algorithm).

placeholder issue for further discussion...

svengato commented 4 years ago

The cowpea QTL markers from https://legumeinfo.org/data/public/Vigna_unguiculata/mixed.qtl.KF1G/ seem to match up with this file. Is that correct?

https://v1.legumefederation.org/data/public/Vigna_unguiculata/IT97K-499-35.gnm1.mrk.52FC/vigun.IT97K-499-35.gnm1.mrk.52FC.Cowpea1MSelectedSNPs.gff3.gz

Note that all the start and end positions are equal, while they may be different in the GFF file we use for GWAS data (which seems backward),

https://v1.legumefederation.org/data/public/Vigna_unguiculata/IT97K-499-35.gnm1.ann1.zb5D/vigun.IT97K-499-35.gnm1.ann1.zb5D.gene_models_main.gff3.gz

adf-ncgr commented 4 years ago

The cowpea QTL markers from https://legumeinfo.org/data/public/Vigna_unguiculata/mixed.qtl.KF1G/ seem to match up with this file. Is that correct?

https://v1.legumefederation.org/data/public/Vigna_unguiculata/IT97K-499-35.gnm1.mrk.52FC/vigun.IT97K-499-35.gnm1.mrk.52FC.Cowpea1MSelectedSNPs.gff3.gz

yes I believe so

Note that all the start and end positions are equal, while they may be different in the GFF file we use for GWAS data (which seems backward),

https://v1.legumefederation.org/data/public/Vigna_unguiculata/IT97K-499-35.gnm1.ann1.zb5D/vigun.IT97K-499-35.gnm1.ann1.zb5D.gene_models_main.gff3.gz

that's the gene annotation file, the mrk file is the one that has the SNPs (which have start=end because they are in fact "single nucleotide polymorphisms"; you must be using the same mrk file already for the GWAS snps (in addition to the annotation file for the gene display)

let me know if I misunderstood.

svengato commented 4 years ago

That is correct. I was confusing the mrk file with the annotations file, now it makes sense.

svengato commented 4 years ago

Some QTL marker files have a Distinction column with values like Flanking, Peak, or (blank), some do not.

svengato commented 4 years ago

From the QTL file specification, it looks like we create a QTL as follows.

  1. Group the markers by QTL identifier (labeled as Identifier, not QTL as described in the specification),
  2. Look up the position of each marker in the GFF file,
  3. Take the QTL range to be the two extreme values.
adf-ncgr commented 4 years ago

that sounds correct to me. and you can ignore "Distinction" although we could imagine that in most cases if Flanking markers are given those should be the extreme values that you end up selecting.

svengato commented 4 years ago

How long (wide?) can a QTL be? Is 1-2 Mbp typical?

adf-ncgr commented 4 years ago

yes, they can be wide- several Mbp is not at all atypical

svengato commented 4 years ago

Typo in the QTL marker file

https://legumeinfo.org/data/public/Vigna_unguiculata/mixed.qtl.KF1G/vigun.mixed.qtl.KF1G.30143525.marker.tsv.gz

In the line

SC.Sanzi_x_Vita7 2_19309 Peak

the second separator is a space, it should be changed to a tab and re-gzipped.

adf-ncgr commented 4 years ago

Give me a second, I'll take care of it tout de suite

svengato commented 4 years ago

No rush, I am working off local copies.

adf-ncgr commented 4 years ago

the rush is that I'll immediately forget!

adf-ncgr commented 4 years ago

Hmm looks like @sammyjava has selfish permissions on this folder/files, so I will have to punt to him anyway.

svengato commented 4 years ago

Also, the markers in

https://legumeinfo.org/data/public/Vigna_unguiculata/mixed.qtl.KF1G/vigun.mixed.qtl.KF1G.27864597.marker.tsv.gz

do not match those in the GFF file. Ignoring these for now.

svengato commented 4 years ago

A simple-minded approach to displaying the QTLs in the Whole Genome chart is to map the QTL range to the Support Intervals feature, which we do not normally use. This leads to some mislabeling in the trait names on the chart and the column names in the Data Table, but serves as a proof of concept. The Chromosome chart works the same way. The height of each QTL is arbitrary (1.0, 1.1, 1.2 ... as needed to not hide any).

Next, I will try to get an idea of how easy it will be to adapt the Support Intervals code, compared to adding new code (or even rewriting ZBrowse from scratch). Also, should GWAS points and QTL bars go on the same chart? Currently, each has its own dataset and therefore goes on a separate chart.

qtl-chart

adf-ncgr commented 4 years ago

That's great- as far as I can tell from looking at the ZBrowse publication, this may in fact have been the use they were intending for the intervals: "Ability to plot both SNPs and genetic intervals. We wanted users to be able to combine the results of quantitative trait locus mapping techniques with GWAS results."

Is this at a point where you could either put on dev-legfedorg or just push to a branch on github so I could get a sense for how it behaves in "hands-on" mode?

svengato commented 4 years ago

It took a while to clean up, but I checked in the changes to master and merged to dev-legfedorg.

svengato commented 4 years ago

It was not able to create the cowpea QTL file from the remote QTL and GFF files, so I copied my local one over. (To do: figure out why.)

adf-ncgr commented 4 years ago

It took a while to clean up, but I checked in the changes to master and merged to dev-legfedorg.

For what I guess was less than a day of effort I'd say the proof of concept is definitely on track; looks like linking in via the URL isn't quite working though. I was going to point Steven to an example of a cowpea QTL on Vu05 that matched some soybean GWAS on Gm18, but this link: http://dev.lis.ncgr.org:50003/shiny/ZBrowse/?tab=Chrom&datasets=Cowpea%20QTL&chr=Vu05&selected=907871&window=250000&datasets2=Soybean%20GWAS&chr2=Gm18&selected2=57035000&window2=250000&traits=Days%20to%20flowering;Flower%20color;Flowering%20time%20under%20long%20daylength%20at%20UCR-CES;Flowering%20time%20under%20short%20daylength%20at%20CVARS&genomicLinkage=true&neighbors=40&matched=20&intermediate=5&selectedGene=vigun.Vigun05g010900&relatedRegion=Gm18%2056.75-57.32%20Mbp

seems to take me to Vu01 instead. Let me know if you want me to file it as a separate issue...

svengato commented 4 years ago

Confirmed, though the problem is not specific to QTL data. This should be a separate issue, I suspect it has to do with the need to reset the chromosome view when the user changes an organism. (Except in this case where we do it programmatically through the URL.)

adf-ncgr commented 4 years ago

Also, the markers in

https://legumeinfo.org/data/public/Vigna_unguiculata/mixed.qtl.KF1G/vigun.mixed.qtl.KF1G.27864597.marker.tsv.gz

do not match those in the GFF file. Ignoring these for now.

ignoring for now seems like the right call; for some reason the publication from which @sammyjava must have taken these seems to be using non-standard identifiers for the markers. I think I have a bead on what's going on and can follow-up with the cowpea group to try to get some further clarity on it.

svengato commented 4 years ago

It was not able to create the cowpea QTL file from the remote QTL and GFF files, so I copied my local one over. (To do: figure out why.)

This is due to running R 3.5.2 on dev-legfedorg (and production), but R 4.0 locally.

  1. stringsAsFactors = TRUE by default in R 3.5.2 but FALSE in 4.0, so enforce stringsAsFactors = FALSE when creating data frames.
  2. is.null(list[[tag]]) does not work in R 3.5.2, so use tag %in% names(list) instead.

Fixed and checked in. (commit d1a502c...)

adf-ncgr commented 4 years ago

Thanks for figuring out the issue on the R versions- I'll note that I have no problem (conceptually) with upgrading the site to use R 4.0 if that will help prevent future snags (and has no other obvious drawbacks)

svengato commented 4 years ago

There is some evidence that upgrading to R 4.0 would break clicking on a SNP (see issue #7). So I would hold off for now.

svengato commented 4 years ago

By the way, R is now up to 4.0.3 "Bunny-Wunnies Freak Out".

adf-ncgr commented 4 years ago

Did you know that the R release nicknaming strategy derives from a legume-focused publication?

svengato commented 4 years ago

Is Charlie Brown any relation to Thomas Browne?

adf-ncgr commented 4 years ago

There's clearly some interest in the quincunx: https://pbs.twimg.com/media/DKw_1uLUMAAf1-X.jpg

sammyjava commented 4 years ago

Hmm looks like @sammyjava has selfish permissions on this folder/files, so I will have to punt to him anyway.

I fixed this, right?

adf-ncgr commented 4 years ago

no, I think you fixed some gwas stuff elsewhere. I believe chgrp -R staff /usr/local/www/data/public/Vigna_unguiculata/mixed* /usr/local/www/data/public/Vigna_unguiculata/IT97K-499-35.gnm1.mrk.52FC will do the job needed for this one.

sammyjava commented 4 years ago

no, I think you fixed some gwas stuff elsewhere. I believe chgrp -R staff /usr/local/www/data/public/Vigna_unguiculata/mixed* /usr/local/www/data/public/Vigna_unguiculata/IT97K-499-35.gnm1.mrk.52FC will do the job needed for this one.

Done.