Open jade-davies opened 4 years ago
Hello @jade-davies , could you please post the results of the R command str(sample_59)
, as well as the first few lines of the file S59_contig_taxonomy.tab
here?
Hi, @kbseah,
Here it is:
str(sample_59)
List of 10 $ scaff :'data.frame': 6018 obs. of 11 variables: ..$ ID : Factor w/ 6018 levels "NODE_1_length_498164_cov_63.706962",..: 1 1112 2223 3334 4445 5556 5686 5797 5908 2 ... ..$ Avg_fold : num [1:6018] 134 107 116 149 125 ... ..$ Length : int [1:6018] 498164 342863 307886 233546 217330 195210 184913 173196 172201 171659 ... ..$ Ref_GC : num [1:6018] 0.507 0.5 0.505 0.465 0.518 ... ..$ Covered_percent: num [1:6018] 100 100 100 100 100 100 100 100 100 100 ... ..$ Covered_bases : int [1:6018] 498164 342863 307886 233546 217330 195210 184913 173196 172201 171659 ... ..$ Plus_reads : int [1:6018] 240969 132112 129027 125118 98178 109871 71153 69809 72215 67135 ... ..$ Minus_reads : int [1:6018] 241025 132349 128926 125281 98167 109860 71268 69863 72172 67155 ... ..$ Read_GC : num [1:6018] 0.49 0.49 0.49 0.443 0.503 ... ..$ Median_fold : int [1:6018] 122 99 106 144 115 138 101 101 102 101 ... ..$ Std_Dev : num [1:6018] 54 39 45.1 57.2 50.4 ... $ covs :'data.frame': 6018 obs. of 2 variables: ..$ ID : Factor w/ 6018 levels "NODE_1_length_498164_cov_63.706962",..: 1 1112 2223 3334 4445 5556 5686 5797 5908 2 ... ..$ scaff.Avg_fold: num [1:6018] 134 107 116 149 125 ... $ markTab :'data.frame': 1514 obs. of 11 variables: ..$ scaffold : Factor w/ 1514 levels "NODE_1_length_498164_cov_63.706962",..: 4 5 6 7 8 3 10 11 12 13 ... ..$ markerid : logi [1:1514] NA NA NA NA NA NA ... ..$ gene : logi [1:1514] NA NA NA NA NA NA ... ..$ Superkingdom: Factor w/ 5 levels "(unassigned)",..: 3 3 1 3 3 3 3 3 3 1 ... ..$ Phylum : Factor w/ 16 levels "(unassigned)",..: 14 14 1 14 14 1 14 1 14 1 ... ..$ Class : Factor w/ 27 levels "(unassigned)",..: 4 13 1 15 13 1 13 1 4 1 ... ..$ Order : Factor w/ 59 levels "(unassigned)",..: 48 19 1 43 19 1 19 1 48 1 ... ..$ Family : Factor w/ 101 levels "(unassigned)",..: 2 28 1 75 28 1 28 1 81 1 ... ..$ Genus : Factor w/ 196 levels "(unassigned)",..: 82 57 1 155 57 1 57 1 109 1 ... ..$ Species : Factor w/ 339 levels "Achromobacter denitrificans",..: 135 92 330 253 94 327 91 327 171 330 ... ..$ source : Factor w/ 1 level "blobology": 1 1 1 1 1 1 1 1 1 1 ... $ markSource: chr "blobology" $ ssuTab :'data.frame': 2 obs. of 13 variables: ..$ scaffold : Factor w/ 2 levels "NODE_43_length_4993_cov_506.111989",..: 2 1 ..$ SSUid : Factor w/ 2 levels "NODE_43_length_4993_cov_506.111989:149-1687(+)",..: 2 1 ..$ SSUgenbankid: Factor w/ 2 levels "CEAB01039785.95.1626",..: 1 2 ..$ Superkingdom: Factor w/ 1 level "Bacteria": 1 1 ..$ Phylum : Factor w/ 1 level "Proteobacteria": 1 1 ..$ Class : Factor w/ 2 levels "Deltaproteobacteria",..: 1 2 ..$ Order : Factor w/ 2 levels "Desulfovibrionales",..: 1 2 ..$ Family : Factor w/ 2 levels "Desulfovibrionaceae",..: 1 2 ..$ Genus : Factor w/ 2 levels "Bilophila","Escherichia-Shigella": 1 2 ..$ Species : Factor w/ 2 levels "Escherichia coli",..: 2 1 ..$ pid : num [1:2] 99.9 100 ..$ alnlen : int [1:2] 776 1523 ..$ eval : int [1:2] -1 -1 $ trnaTab : logi NA $ userTab : list() $ userSource: chr "" $ summary :List of 9 ..$ Total_length : int 7058538 ..$ Num_scaffolds : int 6018 ..$ Scaff_length_max : int 498164 ..$ Scaff_length_min : int 56 ..$ Scaff_length_median: num 298 ..$ Scaff_length_N50 : int 101097 ..$ Num_markers : 'table' int [1(1d)] 1514 .. ..- attr(*, "dimnames")=List of 1 .. .. ..$ : chr "blobology" ..$ Num_SSU : int 2 ..$ Num_tRNAs : logi NA $ call :List of 1 ..$ : language gbt.default(covstats = "U:\linux_shared_data\infant_genome_binning\assembly_coverages\S59.coverage", mark = "| truncated
And the first few lines of the file S59_contig_taxonomy.tab
scaffold markerid gene Superkingdom Phylum Class Order Family Genus Species NODE_1000_length_487_cov_1.861111 NA NA Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales Acetobacteraceae Granulibacter Granulibacter bethesdensis NODE_1002_length_487_cov_1.388889 NA NA Bacteria Proteobacteria Deltaproteobacteria Desulfovibrionales Desulfovibrionaceae Desulfovibrio Desulfovibrio fairfieldensis NODE_1003_length_487_cov_1.314815 NA NA (unassigned) (unassigned) (unassigned) (unassigned) (unassigned) (unassigned) uncultured organism NODE_1005_length_487_cov_1.134259 NA NA Bacteria Proteobacteria Gammaproteobacteria Pseudomonadales Pseudomonadaceae Pseudomonas Pseudomonas sp. FDAARGOS_380
Apologies if this isn't clear - I can send screenshots if it's more useful.
Hello @jade-davies,
I have a hunch about what might be going on here: perhaps there are simply too many taxa to be plotted and none are dominant enough to meet the cutoff described here (see section 4c)
Could you give a go at plotting a higher taxonomic level, such as Phylum or Superkingdom, and setting the markCutoff
parameter to 1
if that doesn't work, either?
Cheers, Brandon
Hello @kbseah,
Perfect, the markCutoff was the issue! I set it to 1 and it is now displaying the colour overlay. It has also happened with a couple of other samples, but it seems to be happening where there is only one cluster. For example, here is the GC coverage plot for sample_59 now at the species level, with markCutoff=1. sample_59_species_no_cutoff.pdf
What do you think? But regardless of the results interpretation, this issue is solved.
Thank you again for your help!
All the best, Jade
Good to hear that it's been resolved, I guess the way the cutoff was designed didn't really a situation like this, and would benefit from a more informative error message at that point. Thanks for helping to figure this one out!
Hello,
I am back again (sorry). I have made coverage tables, taxonomy tables and ssu tables for multiple samples and have imported them individually into R using the code specified in the workflow. Plotting of 2 samples worked successfully, but for the third one I am having an error.
The data was imported successfully using following code:
sample_59 <- gbt(covstats="U:\\linux_shared_data\\infant_genome_binning\\assembly_coverages\\S59.coverage", mark="U:\\linux_shared_data\\infant_genome_binning\\assembly_taxonomy_tables\\S59_contig_taxonomy.tab", marksource="blobology", ssu="U:\\linux_shared_data\\infant_genome_binning\\assembly_ssu_tables\\S59.ssu.tab")
Then, check with summary(sample_59) and gbt_checkinput also shows no errors present.
When plotting with code:
plot(sample_59,taxon="Family",ssu=TRUE,textlabel=TRUE,legend=TRUE,main="Coverage-GC plot of isolate")
The error is
Error in legend("topright", legend = colorframe$taxon, cex = 0.6, fill = as.character(colorframe$colors)) : 'legend' is of length 0
The data is plotted correctly but all datapoints are in grey, with no legend. I am using RStudio with R 3.6.1.