kbseah / genome-bin-tools

Interactive tools for metagenome visualization and binning in R
GNU General Public License v2.0
28 stars 7 forks source link

Coverage-GC plot in R error: 'legend' is of length 0 #35

Open jade-davies opened 4 years ago

jade-davies commented 4 years ago

Hello,

I am back again (sorry). I have made coverage tables, taxonomy tables and ssu tables for multiple samples and have imported them individually into R using the code specified in the workflow. Plotting of 2 samples worked successfully, but for the third one I am having an error.

The data was imported successfully using following code:

sample_59 <- gbt(covstats="U:\\linux_shared_data\\infant_genome_binning\\assembly_coverages\\S59.coverage", mark="U:\\linux_shared_data\\infant_genome_binning\\assembly_taxonomy_tables\\S59_contig_taxonomy.tab", marksource="blobology", ssu="U:\\linux_shared_data\\infant_genome_binning\\assembly_ssu_tables\\S59.ssu.tab")

Then, check with summary(sample_59) and gbt_checkinput also shows no errors present.

When plotting with code: plot(sample_59,taxon="Family",ssu=TRUE,textlabel=TRUE,legend=TRUE,main="Coverage-GC plot of isolate")

The error is

Error in legend("topright", legend = colorframe$taxon, cex = 0.6, fill = as.character(colorframe$colors)) : 'legend' is of length 0

The data is plotted correctly but all datapoints are in grey, with no legend. I am using RStudio with R 3.6.1.

kbseah commented 4 years ago

Hello @jade-davies , could you please post the results of the R command str(sample_59), as well as the first few lines of the file S59_contig_taxonomy.tab here?

jade-davies commented 4 years ago

Hi, @kbseah,

Here it is:

str(sample_59)

List of 10 $ scaff :'data.frame': 6018 obs. of 11 variables: ..$ ID : Factor w/ 6018 levels "NODE_1_length_498164_cov_63.706962",..: 1 1112 2223 3334 4445 5556 5686 5797 5908 2 ... ..$ Avg_fold : num [1:6018] 134 107 116 149 125 ... ..$ Length : int [1:6018] 498164 342863 307886 233546 217330 195210 184913 173196 172201 171659 ... ..$ Ref_GC : num [1:6018] 0.507 0.5 0.505 0.465 0.518 ... ..$ Covered_percent: num [1:6018] 100 100 100 100 100 100 100 100 100 100 ... ..$ Covered_bases : int [1:6018] 498164 342863 307886 233546 217330 195210 184913 173196 172201 171659 ... ..$ Plus_reads : int [1:6018] 240969 132112 129027 125118 98178 109871 71153 69809 72215 67135 ... ..$ Minus_reads : int [1:6018] 241025 132349 128926 125281 98167 109860 71268 69863 72172 67155 ... ..$ Read_GC : num [1:6018] 0.49 0.49 0.49 0.443 0.503 ... ..$ Median_fold : int [1:6018] 122 99 106 144 115 138 101 101 102 101 ... ..$ Std_Dev : num [1:6018] 54 39 45.1 57.2 50.4 ... $ covs :'data.frame': 6018 obs. of 2 variables: ..$ ID : Factor w/ 6018 levels "NODE_1_length_498164_cov_63.706962",..: 1 1112 2223 3334 4445 5556 5686 5797 5908 2 ... ..$ scaff.Avg_fold: num [1:6018] 134 107 116 149 125 ... $ markTab :'data.frame': 1514 obs. of 11 variables: ..$ scaffold : Factor w/ 1514 levels "NODE_1_length_498164_cov_63.706962",..: 4 5 6 7 8 3 10 11 12 13 ... ..$ markerid : logi [1:1514] NA NA NA NA NA NA ... ..$ gene : logi [1:1514] NA NA NA NA NA NA ... ..$ Superkingdom: Factor w/ 5 levels "(unassigned)",..: 3 3 1 3 3 3 3 3 3 1 ... ..$ Phylum : Factor w/ 16 levels "(unassigned)",..: 14 14 1 14 14 1 14 1 14 1 ... ..$ Class : Factor w/ 27 levels "(unassigned)",..: 4 13 1 15 13 1 13 1 4 1 ... ..$ Order : Factor w/ 59 levels "(unassigned)",..: 48 19 1 43 19 1 19 1 48 1 ... ..$ Family : Factor w/ 101 levels "(unassigned)",..: 2 28 1 75 28 1 28 1 81 1 ... ..$ Genus : Factor w/ 196 levels "(unassigned)",..: 82 57 1 155 57 1 57 1 109 1 ... ..$ Species : Factor w/ 339 levels "Achromobacter denitrificans",..: 135 92 330 253 94 327 91 327 171 330 ... ..$ source : Factor w/ 1 level "blobology": 1 1 1 1 1 1 1 1 1 1 ... $ markSource: chr "blobology" $ ssuTab :'data.frame': 2 obs. of 13 variables: ..$ scaffold : Factor w/ 2 levels "NODE_43_length_4993_cov_506.111989",..: 2 1 ..$ SSUid : Factor w/ 2 levels "NODE_43_length_4993_cov_506.111989:149-1687(+)",..: 2 1 ..$ SSUgenbankid: Factor w/ 2 levels "CEAB01039785.95.1626",..: 1 2 ..$ Superkingdom: Factor w/ 1 level "Bacteria": 1 1 ..$ Phylum : Factor w/ 1 level "Proteobacteria": 1 1 ..$ Class : Factor w/ 2 levels "Deltaproteobacteria",..: 1 2 ..$ Order : Factor w/ 2 levels "Desulfovibrionales",..: 1 2 ..$ Family : Factor w/ 2 levels "Desulfovibrionaceae",..: 1 2 ..$ Genus : Factor w/ 2 levels "Bilophila","Escherichia-Shigella": 1 2 ..$ Species : Factor w/ 2 levels "Escherichia coli",..: 2 1 ..$ pid : num [1:2] 99.9 100 ..$ alnlen : int [1:2] 776 1523 ..$ eval : int [1:2] -1 -1 $ trnaTab : logi NA $ userTab : list() $ userSource: chr "" $ summary :List of 9 ..$ Total_length : int 7058538 ..$ Num_scaffolds : int 6018 ..$ Scaff_length_max : int 498164 ..$ Scaff_length_min : int 56 ..$ Scaff_length_median: num 298 ..$ Scaff_length_N50 : int 101097 ..$ Num_markers : 'table' int [1(1d)] 1514 .. ..- attr(*, "dimnames")=List of 1 .. .. ..$ : chr "blobology" ..$ Num_SSU : int 2 ..$ Num_tRNAs : logi NA $ call :List of 1 ..$ : language gbt.default(covstats = "U:\linux_shared_data\infant_genome_binning\assembly_coverages\S59.coverage", mark = "| truncated

And the first few lines of the file S59_contig_taxonomy.tab

scaffold markerid gene Superkingdom Phylum Class Order Family Genus Species NODE_1000_length_487_cov_1.861111 NA NA Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales Acetobacteraceae Granulibacter Granulibacter bethesdensis NODE_1002_length_487_cov_1.388889 NA NA Bacteria Proteobacteria Deltaproteobacteria Desulfovibrionales Desulfovibrionaceae Desulfovibrio Desulfovibrio fairfieldensis NODE_1003_length_487_cov_1.314815 NA NA (unassigned) (unassigned) (unassigned) (unassigned) (unassigned) (unassigned) uncultured organism NODE_1005_length_487_cov_1.134259 NA NA Bacteria Proteobacteria Gammaproteobacteria Pseudomonadales Pseudomonadaceae Pseudomonas Pseudomonas sp. FDAARGOS_380

Apologies if this isn't clear - I can send screenshots if it's more useful.

kbseah commented 4 years ago

Hello @jade-davies,

I have a hunch about what might be going on here: perhaps there are simply too many taxa to be plotted and none are dominant enough to meet the cutoff described here (see section 4c)

Could you give a go at plotting a higher taxonomic level, such as Phylum or Superkingdom, and setting the markCutoff parameter to 1 if that doesn't work, either?

Cheers, Brandon

jade-davies commented 4 years ago

Hello @kbseah,

Perfect, the markCutoff was the issue! I set it to 1 and it is now displaying the colour overlay. It has also happened with a couple of other samples, but it seems to be happening where there is only one cluster. For example, here is the GC coverage plot for sample_59 now at the species level, with markCutoff=1. sample_59_species_no_cutoff.pdf

What do you think? But regardless of the results interpretation, this issue is solved.

Thank you again for your help!

All the best, Jade

kbseah commented 4 years ago

Good to hear that it's been resolved, I guess the way the cutoff was designed didn't really a situation like this, and would benefit from a more informative error message at that point. Thanks for helping to figure this one out!