vloux / ProteoRE

GNU General Public License v3.0
2 stars 5 forks source link

Relealse 2.1: test workflow on proteore.org #272

Closed yvandenb closed 4 years ago

yvandenb commented 4 years ago

History : Workflow_Biomarkers_Cancer_Test_NewRelease Shared with David I updated HPA source files using the data manager, and then tested the workflow "pancreas cancer biomarkers" (workflow has been updated accordingly, with new version of tools) and I had the following error msg (see below) image

davidchristiany commented 4 years ago

I did not test with RNA_seq, my bad. The ref file (pathology) has changed, so I juste need to update the tool to wotk with that new file.

I don't know how was build the previous file but we have now more "values" : before: Gene Gene name Sample Value Unit
ENSG00000000003 TSPAN6 breast 53.4 TPM
after: Gene Gene name Cancer High Medium Low Not detected prognostic - favourable unprognostic - favourable prognostic - unfavourable unprognostic - unfavourable
ENSG00000000003 TSPAN6 breast cancer 1 7 2 2 7.712e-2

We need to decide if we do the same IHM for RNA-seq.

@yvandenb Can I call you this afternoon to settle this ?

yvandenb commented 4 years ago

Sure, I'll be available around 3.30 pm, I will call you...

davidchristiany commented 4 years ago

https://www.proteinatlas.org/download/proteinatlas.tsv.zip

davidchristiany commented 4 years ago

https://toolshed.g2.bx.psu.edu/view/proteore/proteore_tissue_specific_expression_data/3e65e0249976

yvandenb commented 4 years ago

Le tool "Build tissue-specific expression dataset [Human Protein Atlas]" en mode "RNAseq" travaille sur le fichier Version 23/10/2020: https://www.proteinatlas.org/download/rna_tissue_consensus.tsv.zip Version 23/10/2018 rna_tissue.tsv en mode "immunohistochimie" sur le fichier : Version 23/01/2020: https://www.proteinatlas.org/download/normal_tissue.tsv.zip version 23/10/2018: normal_tissue.tsv (6 colonnes)

@yves : modifier le .xml de l'outil sur les source files dans la partie UserDoc YV a mis à jour le fichier html :static/data_source.html

davidchristiany commented 4 years ago

I updated "build tissue specific expression dataset" on proteore-migale. I also updated the data manager to create new ref file for that tool. There are two option for IHC and two for RNAseq:

You can test it but I am currently sorting the tool data panel so you should wait this afternoon.

yvandenb commented 4 years ago

Test is OK: Two things:

  1. HPA source files version => one source file is duplicatedimage
  2. I noticed that the names of "tissue" are hard-encoded in the .xml - however, there are actually 58 (IHC) and 37 (RNAseq) tissue names in the version 23/10/2018 while there are now 63 (IHC) and 62 (RNASeq) tissue names in the version 22/01/2020... becomes a headache !! Etiher we decide to remove to former version then adding the name of each tissue in the .xml (ie. still hard encoded within it), either there is a solution so that the tool can read the "tissue" column form each source, then listing these tissue names dynamically within the box of the "Select tissue" option; I know that the first proposal would be the easiest, yet would the latter proposal be easily feasible ? anyway we need to take a decision...Have a call ?
davidchristiany commented 4 years ago

I corrected the error between normal and rna file, with the last RNA release we got: Capture du 2020-02-03 15-17-08

I checked (with the correct file) and there's 43 different tissue in RNA and 63 in IHC indeed. I think I can manage a solution with the different number of Tissue, I can make a special case of this release (the one included with the tool).

yvandenb commented 4 years ago

not sure we're talking about the same file..? The one I downloaded here: https://www.proteinatlas.org/download/rna_tissue_consensus.tsv.zip looks like this (see below): image only 4 columns whereas yours shows 6 columns : please could you verify ? Rem: The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Tissue") and normalized expression ("NX").

I think I can manage a solution with the different number of Tissue, I can make a special case of this release (the one included with the tool). => would be ideal if you can do it, so go for it ! ;)

davidchristiany commented 4 years ago

You're right, I used the rna_hpa file instead of rna_consensus, I'll change that.

davidchristiany commented 4 years ago

Done for the rna_consensus file.

After second thoughts, it will be really difficult to make a tissue list for each release since we don't know the name of the futur releases. I can specify in the Tissue list that some tissues are only available for some release but that's not ideal.

davidchristiany commented 4 years ago

After many tries, it is not possible to make a specific list option for each release (Build Tissue specific dataset). Such option are made with tags and and for the last one, we need to know the value of the variable "release", expect we don't know yet the value of the future release since it will be made by the data manager. I just made a small change to use .loc file to build the scrolling menu of Tissue and a updated the doc.