kychen37 / rasilab_spelman_2023

FHCC HBCU Summer Internship 2023
0 stars 1 forks source link

Visualize GO analysis results #1

Open kychen37 opened 1 year ago

kychen37 commented 1 year ago

@rasi, @gquarter has generated some results from her GO analysis using the splicing CRISPR screen data, see here. Do you have any suggestions on how to make this into a publication-quality figure?

rasi commented 1 year ago

@gquarter Nice work. Is there a tabulated form of these results? Ideally, we would make plots like the GO analysis plots in Fig 1 and 2 here: https://www.science.org/doi/10.1126/science.abb9662.

kychen37 commented 1 year ago

@rasi I didn't see an easy way to get the data from the webpage, so I just converted it to csv here: https://github.com/kychen37/rasilab_spelman_2023/blob/main/data/gq_go_output.csv

@gquarter, you will want to do a git pull so you have this csv file on your local computer, and then play around with it to see if you can generate something similar to figure 1F in the paper Rasi linked above

Christinebynum commented 1 year ago

@rasi @kychen37 I've also finished generating my data. Should I follow the same steps? https://cbl-gorilla.cs.technion.ac.il/GOrilla/s3f4yrrs/GOResults.html

kychen37 commented 1 year ago

@Christinebynum great, see here for your results in tabular format. You'll also want to git pull as I mentioned above so you can have this dataframe on your local computer to play around with (e.g. with pandas and matplotlib python packages for now)

kychen37 commented 1 year ago

@rasi, @gquarter has done some plyaing around with the outputs of the GO analysis in an effort to decide which processes to focus on for plotting (we have >80 processes that have FDR < 0.05, which is probably too many to plot).

Below is a histogram of GO terms based on enrichment score: https://github.com/kychen37/rasilab_spelman_2023/blob/main/data/gq_filtered_df_3_histogram.png

And a table of GO terms sorted by enrichment score: https://github.com/kychen37/rasilab_spelman_2023/blob/main/data/gq_filtered_df_2_descending.csv

Do you have any input on which processes to focus on for plotting?

rasi commented 1 year ago

Nice work. Seems like the lowest q-values are for splicing and RNA processing. So, highlighting those even as a table will be good enough. Show GO term, description, q-value, enrichment, number of genes in that GO term as 5 columns.

Which screen is this table for? I assume it is for splicing screen. Will be useful to add the sample name in the title or in a README file.

kychen37 commented 1 year ago

@rasi, @gquarter redid the dataframe as you described, see here and yes it's quite clear that splicing are the top processes.

Yes this was the splicing screen, we currently don't have much documentation so we can work on that next week

rasi commented 1 year ago

Looks great. We can likely use this result in our paper. Will be useful to do this for the NMD screen results as well (I forgot if @Christinebynum is doing this already).

kychen37 commented 1 year ago

@rasi here are @Christinebynum 's NMD screen results as a sorted table: https://github.com/kychen37/rasilab_spelman_2023/blob/main/data/cb_splicing_go_table.csv

rasi commented 1 year ago

Great. Can @Christinebynum and @gquarter write a summary of how this GO ontology was carried out? Write it it sufficient detail so that it can be reproduced. Write it in the style of a Methods section of a paper.