Genome-wide Analysis of Protein Disorder in Camelus dromedarius - Githubissues

kacst-bioinfo-lab / labwork

3 stars 1 forks source link

Genome-wide Analysis of Protein Disorder in Camelus dromedarius #26

Closed bioinfo2016 closed 4 years ago

bioinfo2016 commented 5 years ago

-main reference paper[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567104/pdf/pone.0055524.pdf]

Perform extensive analysis for disordered proteins in Camelus dromedarius. To Do:
download Proteins of Human and Arabian Camel from Uniprot.

Manal-Alshehri commented 5 years ago

Dataset:

From UniProtKB, we downloaded Homo Sapiens proteins (73,947).

When searching for Camelus Dromedarius, we found a very small set of proteins. To enlarge this dataset, we decided to include other closely related organisms of camels. We adopted all Camelus genus, that include: Camelus Dromedarius, Camelus Bacterianus, and Camelus Ferus (20,745 proteins).

Since the two datasets are very large, we removed duplicated sequences by applying threshold of similarity %60 (CD-HIT). That resulted in 22,168 proteins for Homo Sapiens and 18,338 for Camelus.

Manal-Alshehri commented 5 years ago

table1-s

Table 1S. Summary of intrinsic disorder metrics for Camel and Human datasets. Results shown for IUPred prediction methods (short and long).

Manal-Alshehri commented 5 years ago

figure-s1

Manal-Alshehri commented 5 years ago

figure-s2

Manal-Alshehri commented 5 years ago

iupred-camel-long-REVIGO-results

Representation of the GO ‘‘Biological Processes’’ significantly enriched in disordered proteins in Camel dataset. Disordered proteins here correspond to those with one or more ‘‘long disordered windows’’ (LDW) based on IUPred predictions. Figure adapted from REVIGO, a system for summarizing and visualizing lists of GO terms. Each rectangle represents a cluster of related terms labeled according to a representative term. Rectangles are grouped in ‘‘superclusters’’ (identified with the same color) based on SimRel semantic similarity measure.

Manal-Alshehri commented 4 years ago

table-1S-in-suplementary-with-ESpritz

Table 1S with ESpritz results included. Summary of intrinsic disorder metrics for Camel and Human datasets.

Manal-Alshehri commented 4 years ago

Figure-3S-my-suplementary

Manal-Alshehri commented 4 years ago

Figure 1. Overall predicted global disorder and disordered binding regions in Camel and H. sapiens proteins. Left: percentages of disordered proteins (disordered proteins criterion: those proteins containing at least 50% disordered residues based on Disopred predictions). Right: average percentages of disordered residues involved in binding (DBRs), as predicted by Disopred.

Manal-Alshehri commented 4 years ago

Figure2-A Figure 2-A Fraction of proteins with different degrees of predicted disorder in Camel and H. sapiens. Protein disorder (as the percentage of disordered residues with respect to the sequence length) is binned into different ranges. Data based on Disopred predictions.

Manal-Alshehri commented 4 years ago

Figure2-B Figure 2-B: Fraction of proteins with different degrees of predicted disordered binding regions in Camelus and H. sapiens (using Disopred)

bioinfo2016 commented 4 years ago

ANCHOR will give a better insight of the disordered binding regions.

Manal-Alshehri commented 4 years ago

Table1-Disopred-ANCHOR

Table1: Summary of intrinsic disorder metrics for Human and Camels. Results shown for Disopred (disorder prediction) and ANCHOR (disorder binding regions, DBRs).

Manal-Alshehri commented 4 years ago

Figure2-B-anchor Figure 2-B: Fraction of proteins with different degrees of predicted disordered binding regions in Camelus and H. sapiens (using ANCHOR)

Manal-Alshehri commented 4 years ago

Figure1-with-ANCHOR Figure 1. Overall predicted global disorder and disordered binding regions in Camel and H. sapiens proteins. Left: percentages of disordered proteins (disordered proteins criterion: those proteins containing at least 50% disordered residues based on Disopred predictions). Right: average percentages of disordered residues involved in binding (DBRs), as predicted by ANCHOR.

Manal-Alshehri commented 4 years ago

report

Manal-Alshehri commented 4 years ago

GO:Biological Processes terms associated with Disordered Proteins in Camel dataset was summarized and visualized using Revigo. Disordered proteins here correspond to those with one or more ‘‘long disordered Region’’ (LDR) based on DISOPRED predictions. Revigo-for-disorder-Camel

Manal-Alshehri commented 4 years ago

To perform comparative analysis between H. and C: • I run PANNZER on: all proteins of H., all proteins in C., Disorder H., and Disorder C. I consider GO terms that has PPV 0.7 or above. • I detected 1993 common (shared) Go terms that are in H. AND in C. • I extract common GO terms and quantify them in: all-H.-dataset, all-C.-dataset, Disorder-H., and Disorder-C. • I computed contingency tables for each common GO term (observed/expected values). • Then computed Chi-square (P-value) for all common GO. • I keep only GO terms where the observed disorder in C. is greater than expected. I end up with 495 GO terms. • I computed the average of GO terms from previous step to see the enrichment percentage of each term in both disorder datasets. I filtered the results by considering only the terms where the percentage in C. is greater than that in H. We had 495 GO terms. • I found that this GO terms list are exactly the same as those when I consider Observed C. > Expected C. This was done in order to verify that the eventual differences in disorder are maintained when considering only the ‘‘comparable’’ proteins, and discard that these differences might be due to biases in the GO annotations of these two genomes. • To conclude, these 495 GO terms are more enriched in Camel Disordered proteins than in Human Disorder proteins.

Manal-Alshehri commented 4 years ago

Updated summary for PANNZER results after applying > 0.7 threshold on PPV.

new-report-to-github-with0 7-threshold

Manal-Alshehri commented 4 years ago

Revigo representation of Disorder proteins in Camel using GO:BP terms with PPV>0.7. The Treemap appearance was improved using DrasticData.

Treemap-Disorder-C

The full list of the GO terms and their description (in colored boxes) and their general representative (in white boxes) is provided by Revigo as a table (in addition to other categories such as frequency and uniqueness)

Manal-Alshehri commented 4 years ago

Revigo representation of GO:BP terms (with PPV>0.7) that are more enriched in Camel Disordered proteins than in Human Disorder proteins. The Treemap appearance was improved using DrasticData.

Revigo-comparative-Disorder-TreeMap-result

The full list of the GO terms and their description (in colored boxes) and their general representative (in white boxes) is provided by Revigo as a table (in addition to other categories such as frequency and uniqueness)

Manal-Alshehri commented 4 years ago

Revigo representation of GO:BP terms (with PPV>0.7) that are more enriched in DBR in Camel than in Human. The Treemap appearance was improved using DrasticData.

Revigo-comparative-DBR-TreeMap-result

The full list of the GO terms and their description (in colored boxes) and their general representative (in white boxes) is provided by Revigo as a table (in addition to other categories such as frequency and uniqueness)