jamiewaese / ePlant

ePlant is a data visualization tool for integrating and exploring multiple levels of biological data.
MIT License
2 stars 1 forks source link

Design a JSON file structure for eFPs #68

Closed yuzhenmi closed 9 years ago

yuzhenmi commented 10 years ago

I think it would be helpful if we can build upon the file structure and make room for information that are useful for defining an eFP. The goal is to be able to generate a complete interactive eFP diagram and access appropriate web services for data, with only the information provided in the JSON file.

So far, the file structure that I am using is very simple and definitely needs additions:

{
    // Size of the eFP diagram
    "width": Number,
    "height": Number,

    // Link to the web service that provides a level value for a given gene and sample
    // The web service address should work with "primaryGene=$primaryGene&sample=$sample"
    //   appended to the end
    "webService": String,

    // Array of labels
    "labels": [
        {
            // Position of label
            "x": Number,
            "y": Number,

            // String content of label
            "content": String,

            // Font of label
            // This should not include anything other than the font (e.g. "Helvetica")
            "font": String,

            // Size of label
            "size": Number,

            // Whether label is bolded (optional)
            "bold": Boolean,

            // Whether label is italic (optional)
            "italic": Boolean,

            // Whether label is underlined (optional)
            "underline": Boolean,

            // Link to new page when user clicks on label (optional)
            // Should allow inclusion of variables such as gene identifier with the $ flag
            "link": String
        },
        ...
    ],

    // Name of the control sample
    "control": {
        "id": String,
        "samples": [
            String,
            ...
        ],
        "source": String
    },

    // Array of SVG paths that define all the outlines of the eFP
    // The outlines are purely for rendering purpose, they will not have any role in
    //   user interactions
    // Outlines are stroked and not filled, so none of the paths are required
    //   to be closed
    "outline": {
        "paths": [
            String,
            ...
        ],
        "color": String
    },

    // Array of objects that define each plant group
    "groups": [
        {
            // Array of SVG paths that define a group (e.g. plant tissue)
            // Paths can be closed or opened, but only closed paths will be used
            //   for user interactions, such as mouse hover effects
            // These paths are filled but not stroked when mouse is not hovering
            //   over the group, and filled and stroked if the mouse is hovering the group
            // Note open paths will not be visible unless the user hovers mouse
            //   over the group
            "paths": [
                String,
                ...
            ],

            // Unique identifier of the group
            "id": String

            // Array of sample names for this group
            "samples": [
                String,
                ...
            ]

            // Link to source of data
            "source": String
        },
        ...
    ]
}
jamiewaese commented 10 years ago

Include sample names, control names, and the URL of the web service.

On Dec 5, 2013, at 7:24 AM, Hans Yu notifications@github.com wrote:

I think it would be helpful if we can build upon the file structure and make room for information that are useful for defining an eFP. The goal is to be able to generate a complete interactive eFP diagram and access appropriate web services for data, with only the information provided in the JSON file.

So far, the file structure that I am using is very simple and definitely needs additions:

{ // Array of SVG paths that define all the outlines of the eFP // The outlines are purely for rendering purpose, they will not have any role in // user interactions // Outlines are stroked and not filled, so none of the paths are required // to be closed "outline": [ String, ... ],

// Array of objects that define each plant tissue "groups": [ { // Array of SVG paths that define the plant tissue // Paths can be closed or opened, but only closed paths will be used // for user interactions, such as mouse hover effects // These paths are filled but not stroked when mouse is not hovering // over the tissue, and filled and stroked if the mouse is hovering the tissue // Note open paths will not be visible unless the user hovers mouse // over the tissue "paths": [ String, ... ],

    // Name of the plant tissue, this should be unique
    // Used as both the label and the unique identifier for the tissue
    "name": String
},
...

] }

— Reply to this email directly or view it on GitHub.

nprovart commented 10 years ago

Hi Hans, here's the current XML file for the default eFP view: http://bar.utoronto.ca/efp/cgi-bin/data/Developmental_Map.xml. Can we co-opt this? N.

......................................... Nicholas Provart, PhD Associate Professor, Plant Cyberinfrastructure & Systems Biology Chair, Bioinformatics SC, Multinational Arabidopsis Steering Committee Member, North American Arabidopsis Steering Committee and IAIC Member, Centre for the Analysis of Genome Evolution and Function

Currently on sabbatical in the Brady Lab at UC Davis

Phone. +1-530-754-9652 Skype. nicholas.provart, Fax. +1-425-675-7036 URL. http://www.csb.utoronto.ca/faculty/provart-nicholas The Bio-Analytic Resource. http://www.BAR.utoronto.ca email. nicholas.provart@utoronto.ca

On Thu, Dec 5, 2013 at 6:36 AM, Jamie Waese notifications@github.comwrote:

Include sample names, control names, and the URL of the web service.

On Dec 5, 2013, at 7:24 AM, Hans Yu notifications@github.com wrote:

I think it would be helpful if we can build upon the file structure and make room for information that are useful for defining an eFP. The goal is to be able to generate a complete interactive eFP diagram and access appropriate web services for data, with only the information provided in the JSON file.

So far, the file structure that I am using is very simple and definitely needs additions:

{ // Array of SVG paths that define all the outlines of the eFP // The outlines are purely for rendering purpose, they will not have any role in // user interactions // Outlines are stroked and not filled, so none of the paths are required // to be closed "outline": [ String, ... ],

// Array of objects that define each plant tissue "groups": [ { // Array of SVG paths that define the plant tissue // Paths can be closed or opened, but only closed paths will be used // for user interactions, such as mouse hover effects // These paths are filled but not stroked when mouse is not hovering // over the tissue, and filled and stroked if the mouse is hovering the tissue // Note open paths will not be visible unless the user hovers mouse // over the tissue "paths": [ String, ... ],

// Name of the plant tissue, this should be unique // Used as both the label and the unique identifier for the tissue "name": String }, ... ] }

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/jamiewaese/ePlant/issues/68#issuecomment-29901958 .

yuzhenmi commented 10 years ago

I updated the file structure to include control name, sample names and link to data source for each tissue, web service URL. Also included icons and text labels to support poster crafting. https://github.com/jamiewaese/ePlant/issues/68

jamiewaese commented 10 years ago

Cool. Once we get the Arabidopsis developmental map working, send us the structure so we can start the other views.

On 2013-12-05, at 5:28 PM, Hans Yu wrote:

I updated the file structure to include control name, sample names and link to data source for each tissue, web service URL. Also included icons and text labels to support poster crafting. #68

— Reply to this email directly or view it on GitHub.

yuzhenmi commented 10 years ago

I've been thinking... right now, sample levels are retrieved one by one by the JavaScript-based client. Maybe it is a better idea for our web service to output all sample levels relevant to the eFP diagram at once. One particular example is cell eFP, where there are no real samples and the levels for each compartment are computed together (thus outputting levels by individual compartments is inefficient). If we do that, then the "samples" field for "group" can be dropped, because the web service output should include sample names. I think it would also speed up the data retrieval process. It would require a bit more work on the back-end for plant tissue eFPs though, since the existing web service outputs levels by sample.

jamiewaese commented 10 years ago

The current system is that we tell the webservice a bunch of sample names we're interested in, then it spits back a set of numbers for us to manipulate. Can you describe your idea further?

I like the idea of moving whatever data processing has to happen such as calculating means and standard deviations to the back end. Those numbers should be calculated once and stored in the DB. No need to recalculate them every time a client opens the page. In a perfect world, the viewer should just be a viewer.

On 2013-12-08, at 10:44 PM, Hans Yu wrote:

I've been thinking... right now, sample levels are retrieved one by one by the JavaScript-based client. Maybe it is a better idea for our web service to output all sample levels relevant to the eFP diagram at once. One particular example is cell eFP, where there are no real samples and the levels for each compartment are computed together (thus outputting levels by individual compartments is inefficient). If we do that, then the "samples" field for "group" can be dropped, because the web service output should include sample names. I think it would also speed up the data retrieval process. It would require a bit more work on the back-end for plant tissue eFPs though, since the existing web service outputs levels by sample.

— Reply to this email directly or view it on GitHub.

nprovart commented 10 years ago

A couple of aspects to this discussion: if we change the view in any way then we need to change the backend, both for returning sample names and means etc. Storing the sample information in the JSON file makes the backend view-agnostic. If it's not happening right now, however, we can implement the backend such that an array/hash of values is returned for a given array/hash of sample names, but I thought this was the case. N.

......................................... Nicholas Provart, PhD Associate Professor, Plant Cyberinfrastructure & Systems Biology Chair, Bioinformatics SC, Multinational Arabidopsis Steering Committee Member, North American Arabidopsis Steering Committee and IAIC Member, Centre for the Analysis of Genome Evolution and Function

Currently on sabbatical in the Brady Lab at UC Davis

Phone. +1-530-754-9652 Skype. nicholas.provart, Fax. +1-425-675-7036 URL. http://www.csb.utoronto.ca/faculty/provart-nicholas The Bio-Analytic Resource. http://www.BAR.utoronto.ca email. nicholas.provart@utoronto.ca

On Mon, Dec 9, 2013 at 6:25 AM, Jamie Waese notifications@github.comwrote:

The current system is that we tell the webservice a bunch of sample names we're interested in, then it spits back a set of numbers for us to manipulate. Can you describe your idea further?

I like the idea of moving whatever data processing has to happen such as calculating means and standard deviations to the back end. Those numbers should be calculated once and stored in the DB. No need to recalculate them every time a client opens the page. In a perfect world, the viewer should just be a viewer.

On 2013-12-08, at 10:44 PM, Hans Yu wrote:

I've been thinking... right now, sample levels are retrieved one by one by the JavaScript-based client. Maybe it is a better idea for our web service to output all sample levels relevant to the eFP diagram at once. One particular example is cell eFP, where there are no real samples and the levels for each compartment are computed together (thus outputting levels by individual compartments is inefficient). If we do that, then the "samples" field for "group" can be dropped, because the web service output should include sample names. I think it would also speed up the data retrieval process. It would require a bit more work on the back-end for plant tissue eFPs though, since the existing web service outputs levels by sample.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/jamiewaese/ePlant/issues/68#issuecomment-30135421 .

yuzhenmi commented 10 years ago

OK, so we want to have the samples defined in the JSON. I was just thinking of letting the back-end decide which samples should be returned, then we can simply supply the back-end with a gene identifier and receive a hash-table of sample:values, but the back-end would not be view-agnostic then.

jamiewaese commented 10 years ago

Can I suggest a 3-way Skype call for Tuesday? Let's make sure we're making the right choice before we go much further. It would also be good to catch up all together.

On 2013-12-09, at 3:25 PM, Hans Yu wrote:

OK, so we want to have the samples defined in the JSON. I was just thinking of letting the back-end decide which samples should be returned, then we can simply supply the back-end with a gene identifier and receive a hash-table of sample:values, but the back-end would not be view-agnostic then.

— Reply to this email directly or view it on GitHub.

nprovart commented 10 years ago

We will have many views down the road, and perhaps even user-customizable ones, so I think it is best to keep the sample info in the JSON, Hans.

......................................... Nicholas Provart, PhD Associate Professor, Plant Cyberinfrastructure & Systems Biology Chair, Bioinformatics SC, Multinational Arabidopsis Steering Committee Member, North American Arabidopsis Steering Committee and IAIC Member, Centre for the Analysis of Genome Evolution and Function

Currently on sabbatical in the Brady Lab at UC Davis

Phone. +1-530-754-9652 Skype. nicholas.provart, Fax. +1-425-675-7036 URL. http://www.csb.utoronto.ca/faculty/provart-nicholas The Bio-Analytic Resource. http://www.BAR.utoronto.ca email. nicholas.provart@utoronto.ca

On Mon, Dec 9, 2013 at 12:25 PM, Hans Yu notifications@github.com wrote:

OK, so we want to have the samples defined in the JSON. I was just thinking of letting the back-end decide which samples should be returned, then we can simply supply the back-end with a gene identifier and receive a hash-table of sample:values, but the back-end would not be view-agnostic then.

— Reply to this email directly or view it on GitHubhttps://github.com/jamiewaese/ePlant/issues/68#issuecomment-30169837 .

yuzhenmi commented 10 years ago

Let's keep the file structure as it is then, it is currently view-agnostic. Skype call would be great! I'm good any time after 11 am Toronto.

yuzhenmi commented 10 years ago

Maybe we should treat the cell viewer as a special eFP view. It can directly use data from the new suba3 web service (used by interaction viewer): http://bar.utoronto.ca/~eplant/cgi-bin/suba3.cgi?id=at2g41460

Wouldn't make any sense to retrieve confidence values by compartment.

jamiewaese commented 10 years ago

Tell me how it's different from the other efp's? We have svg shapes that get coloured according to gene expression levels for that sample, no?

On Dec 10, 2013, at 1:40 AM, Hans Yu notifications@github.com wrote:

Maybe we should treat the cell viewer as a special eFP view. It can directly use data from the new suba3 web service (used by interaction viewer): http://bar.utoronto.ca/~eplant/cgi-bin/suba3.cgi?id=at2g41460

Wouldn't make any sense to retrieve confidence values by compartment.

— Reply to this email directly or view it on GitHub.

nprovart commented 10 years ago

Hi Hans and Asher, yes I think it would be good to keep them (Cell eFP and regular expression level eFPs) separate, as the connectivity/kinds of data for Cell eFP will be different than for the normal eFP views, regardless of species (or the backend needs to be separate to deal with this...).

B.t.w., keep in mind when designing the JSON structure that you'll need to account for the fact that there can be a one-to-one or one-to-many control group to sample grouping mapping. Some examples of this: http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi?dataSource=Abiotic_Stress (here there's one control for each timepoint, and several different treatments per timepoint: cold, heat, drought etc.) XML for this: http://bar.utoronto.ca/efp/cgi-bin/data/Abiotic_Stress.xml.

In the case of the Biotic Stress view, there are many more control groups, which are sometimes paired with just one sample (not multiple samples). http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi?dataSource=Biotic_Stress ( http://bar.utoronto.ca/efp/cgi-bin/data/Biotic_Stress.xml)

If you want to redesign the Expression webservice to accept an array of names, Asher can provide you with access to the current expression web service to modify.

Oh, and can the AGI IDs in the chromosome view be rendered in black when you Get the Gene? They don't stand out otherwise.

Best,

Nick

On Tue, Dec 10, 2013 at 5:50 AM, Jamie Waese notifications@github.comwrote:

Tell me how it's different from the other efp's? We have svg shapes that get coloured according to gene expression levels for that sample, no?

On Dec 10, 2013, at 1:40 AM, Hans Yu notifications@github.com wrote:

Maybe we should treat the cell viewer as a special eFP view. It can directly use data from the new suba3 web service (used by interaction viewer): http://bar.utoronto.ca/~eplant/cgi-bin/suba3.cgi?id=at2g41460

Wouldn't make any sense to retrieve confidence values by compartment.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/jamiewaese/ePlant/issues/68#issuecomment-30228041 .