yilinZhang-bio / Watermelon-pangenome

Apache License 2.0
19 stars 6 forks source link

Scripts for figure 2 #2

Open annerilotter opened 1 month ago

annerilotter commented 1 month ago

Hi, I am missing the scripts for creating the plots in figure 2 and the scripts to generate the relevant data. I am missing the same for parts d and e of figure 3. Could they please be added or shared

yilinZhang-bio commented 1 month ago

I apologize for the delayed response. I've been traveling and just got the chance to address your question.

For Figure 2: The data for this figure was generated using OrthoFinder with the following command:

orthofinder -f Data -t 128 -S diamond -M msa -T fasttree

After running this command, you can use the statistics from the Comparative_Genomics_Statistics output and the information from the Orthogroups to create Figure 2. The specific plots were generated based on these results, without requiring additional scripts.

For Figure 3 (parts d and e): These parts were also created based on statistical analysis of the results. We compiled the relevant data into an Excel sheet, which was then used to create the plots using ggplot2 in R.

Please let me know if you need any further clarification or if you'd like more specific details on any part of the process.

annerilotter commented 1 month ago

Thank you for the reply. Did you use bedtools intersect for figure 3e?

yilinZhang-bio commented 1 month ago

Of course. We did use bedtools intersect for generating the results shown in Figure 3e. Specifically, we used parameters like bedtools intersect -a -b -wao. It's important to note that there can be multiple overlapping regions. We calculated the overlaps based on the following priority: exons > introns > upstream of genes > downstream of genes > intergenic regions. I hope this helps!

annerilotter commented 1 month ago

Great. How does the upstream/downstream of genes work exactly as they are obviously not a feature of the annotation file?

yilinZhang-bio commented 1 month ago

2000 bp is our standard