statisticalbiotechnology / metabric-pathway-survival

Plots and
Apache License 2.0
1 stars 1 forks source link

Generating sunburst #1

Closed lireo closed 2 years ago

lireo commented 2 years ago

Hello

I'm a bioinformatician and I like your interactive sunburst to explore reactome enrichment analysis. I have a lot of reactomeGSA results to analyse (10 comparisons) and I try to use your script (generate_sunburst.py) to draw interactive sunbursts but I failed, I have an error "G is not a tree" with my data. My data is slightly different than yours and I think if I had your input file, I can understand how data is transformed by the script. And I hope I will debug my issue. Can you send me your input files (metabric_path_survival.p and metabric_path_activities.p) ? Or example files ? Thank you Aurelie

percolator commented 2 years ago

Gustavo, would you be able to share the specified files?

On Fri, Jun 24, 2022 at 3:25 PM lireo @.***> wrote:

Hello

I'm a bioinformatician and I like your interactive sunburst to explore reactome enrichment analysis. I have a lot of reactomeGSA results to analyse (10 comparisons) and I try to use your script (generate_sunburst.py) to draw interactive sunbursts but I failed, I have an error "G is not a tree" with my data. My data is slightly different than yours and I think if I had your input file, I can understand how data is transformed by the script. And I hope I will debug my issue. Can you send me your input files (metabric_path_survival.p and metabric_path_activities.p) ? Or example files ? Thank you Aurelie

— Reply to this email directly, view it on GitHub https://github.com/statisticalbiotechnology/metabric-pathway-survival/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXKAFD4VDKBG2SY6RUHBTVQWZMRANCNFSM5ZX3EDHA . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>

gjeuken commented 2 years ago

Hi Aurelie, The function you want to use in generarate_sunburst.py is:

sunburst(in_df, outname = 'sun_tree.json')

It takes as input a pandas DataFrame in_df, it has as indexes the pathway reactome ID, and the columns 'value' as the colors to be plotted, 'ngenes' as the width (number of genes in the pathway), and 'descr' as the description of the pathway.

Be sure to have the ReactomePathwaysRelation.txt file (you can download it from the Reactome website) on the .data/ path.

It will output a json file on the path given to the outname parameter. This json file then has to be read by the d3.js implementation on the doc/ directory. The easiest way to do this is to replace the result.json file in that directory. Since d3.js is javascript, simply opening the doc/result.html on a browser probably won't work, you have to set up a local web server. The easiest way to do this is by

python3 -m http.server

You can find more info about this here

Let me know if this works for you.

lireo commented 2 years ago

Hi When you say colors to be plotted in 'value' column, it's a color like hexa or a value converted to color by the script ? I have the ReactomePathwaysRelation.txt and the script found it. I think my issue is between secondDict and thirdDict inside sunburst function. That is why I'd like to see your input file. How do you manage parents in the hierarchy not explicitly present in the input df ? I think that's the point. Thank you for your help

gjeuken commented 2 years ago

Ah, I've just found a possible bug, the code was changed to accept a "q" column instead of "value" column. This has been reverted now.

The hierarchy is provided by the ReactomePathwaysRelation.txt file. The "value" column should be numeric and is converted to colors by the d3js library. One thing to keep in mind is that your analysis and the ReactomePathwaysRelation.txt should be using the same version of Reactome.

An example of a input DataFrame would be: value ngenes descr
R-HSA-1122 1.4 12 Signaling Pathway A
R-HSA-2233 2.1 23 This is also a pathway
lireo commented 2 years ago

I reload the new script with value instead of q but nothing changed. My reactomeGSA result are quite old (last year) so I rerun it to use same version of reactome in my analysis and ReactomePathwaysRelation.txt (I download it today).

I have again the same key error but later (1985st key instead of 19th) : Traceback (most recent call last): File "/Users/aurelie/Documents/projets/host-RNAs_evs/analyse_aurelie/enrichment/reactomegsa/generate_sunburst.py", line 109, in <module> sunburst(in_df, outname='sunburst/results.json') File "/Users/aurelie/Documents/projets/host-RNAs_evs/analyse_aurelie/enrichment/reactomegsa/generate_sunburst.py", line 81, in sunburst thirdDict['value'].update({key : topDict['value'][value]}) KeyError: 'R-HSA-162909'

This key is included in ReactomePathwaysRelation.txt but not in my input file. Previously I tried to ignore these keys by adding if value in topDict['value'] before the update but after I have the error 'G is not a tree'.

I think I'm improving but I miss something with dictionaries construction. Thanks

gjeuken commented 2 years ago

Hi again Aurelie,

The "G is not a tree" error might be due to the fact that by excluding a node from graph G at random, it might not be a tree anymore. This would only work if the excluded node is one of the end leafs. Since the hierarchy itself is a tree, we need a full tree in order to display the results correctly.

Would using a placeholder value for the missing results work for you?

lireo commented 2 years ago

Hi, I think I found the problem. In my previous run, a lot of paths were missing (all non significative). And when I rerun reactomeGSA, I use include_disease_pathways = FALSEbut I don't kow why, one disease path was always present (R-HSA-164952). Consequently the disease part of the tree is full of holes. Right now, I rerun reactomeGSA with include_disease_pathways = TRUE and hope I will finally get my sunburst after.

Thanks for your help. Have a great week-end

lireo commented 2 years ago

Hi,

Finally it works ! As I thought it was a problem with disease pathways. If diseases are included, some are missing, even with a new updated reactomegsa analysis. And if diseases are excluded, a pathway stay in the analysis. So, I run without diseases and manually remove the pathway stuck (R-HSA-164952).

With your explanations about a local web server and the html file, I managed to generate my sunbursts.

Thank you