lmrodriguezr / nonpareil

Estimate metagenomic coverage and sequence diversity
http://enve-omics.ce.gatech.edu/nonpareil/
Other
42 stars 11 forks source link

Better visualization of Nonpareil.curve.batch for 200 samples #39

Closed Jigyasa3 closed 4 years ago

Jigyasa3 commented 4 years ago

Hey @lmrodriguezr

I was wondering if it's possible to call ggplot or any other visualization R software to get a cleaner/publication quality image for Nonpareil.curve for 200 samples?

My codes- $ls 230*.npo > sample_list_batch.txt #add .rpo filenames to a text file R

sample_names<-read.table("sample_list_batch.txt",header=FALSE) colnames(sample_names)<-c("File") #the filenames is the first column called "File" sample_names$Name<-gsub("_R1.fastq.gz.fa-nonpareli-output.npo","",sample_names$File) #create a new column called "Name" attach(sample_names) pdf("batch_curve_plot.pdf") np<-Nonpareil.curve.batch(sample_names$File,label=sample_names$Name ,modelOnly=TRUE);

>Nonpareil.legend(np)

detach(sample_names) dev.off()

My plot- image

Thanks for help! Jigyasa

lmrodriguezr commented 4 years ago

Hello @Jigyasa3

I recommend considering these three things that could improve the visualization for you:

1. Plot model only

You're currently using an old flag (modelOnly = TRUE), that is being ignored. Use instead plot.observed = FALSE:

np <- Nonpareil.curve.batch(sample_names$File, label = sample_names$Name, plot.observed = FALSE);

2. Do not include the legend

There is no way to effectively pair 200 lines with different colors with their corresponding legend, so I'd suggest removing it all together (i.e., do not call Nonpareil.legend). If you have groups of samples that are meaningful for your manuscript, I'd suggest using that instead. You can pass the colors you want to Nonpareil.curve.batch (by default the colors are generated at random), so you could have groups of samples instead.

3. Consider passing other graphical parameters

The Nonpareil.curve.batch function would take almost any additional graphical parameters (I personally use las = 1, for example). Also, there is only one active plot, so you can call any other functions afterwards (e.g., legend).

I hope these tips help improve the visualization.

M

Jigyasa3 commented 4 years ago

Thank you for replying @lmrodriguezr ! Removing the legend does improve the plot, but it's still a bit crowded with 200 samples. Could you check the new issue I have created for the same?

Thank you for your help!

lmrodriguezr commented 4 years ago

Did you correct the flag to plot.observed = FALSE? That will also reduce the noise in the plot.