ohlab / GRiD

Growth Rate Index (GRiD) measures bacterial growth rate from reference genomes (including draft quality genomes) and metagenomic bins at ultra-low sequencing coverage (> 0.2x).
31 stars 6 forks source link

GRiD values only generated for some MAGs #15

Closed franciscozorrilla closed 4 years ago

franciscozorrilla commented 5 years ago

Hi, I am wondering why GRiD multiplex does not give me a GRiD value for each of my MAGs/bins?

I am developing a toy example, where I subset 3000000 reads (~10%) from each of 3 different sets of paired end reads. Then I assemble, extract MAGs and refine them. For one particular sample I get:

bin completeness    contamination   GC  lineage N50 size
bin.1.orig  63.73   4.181   0.383   Clostridiales   1788    1773475
bin.2.orig  97.98   2.237   0.460   Clostridiales   63743   2669734
bin.3.permissive    64.74   1.762   0.597   Clostridiales   3314    1662742
bin.5.strict    97.70   1.677   0.415   Clostridiales   14495   2056116
bin.6.strict    93.72   2.013   0.495   Clostridiales   26899   1984078
bin.7.orig  56.83   1.711   0.599   Actinobacteria  1876    1196365
bin.8.strict    99.16   1.136   0.596   Bifidobacteriaceae  32312   2194250
bin.9.permissive    82.58   2.717   0.318   Euryarchaeota   2867    1230957

Next I generate a sample-specific database using the 8 MAGs and run GRiD on multiplex mode:

grid multiplex -r . -e fastq.gz -d MAGdb -p -c 0.2 -o out -n 48

I tested with and without the pathoscope option, and I get very similar results:

bin.2.orig  1.58    1.65    0.0424242424242424  6.968
bin.6.strict    1.14    1.18    0.0338983050847458  5.767
bin.8.strict    1.76    1.84    0.0434782608695653  10.044

I was expecting to get a GRiD value for each of my bins, but this is not the case. Should I be using the grid single module instead?

Thanks, FZ

aemiol commented 5 years ago

You will only get values for genomes above the coverage threshold (i.e. the -c flag). Also, if those MAGs are highly fragmented (i.e. < 90 fragments/Mbp), GRiD requires a minimum coverage of 1X.

Cheers, Tunde

franciscozorrilla commented 5 years ago

I see, thank you for your response. Indeed when I look at the number of contigs in each of my bins I can see why those are the only ones with GRiD values:

bin.1.orig.fa 1000
bin.2.orig.fa 159
bin.3.permissive.fa 629
bin.5.strict.fa 243
bin.6.strict.fa 127
bin.7.orig.fa 655
bin.8.strict.fa 131
bin.9.permissive.fa 522

Do you have any insight into how one could increase the contiguity/reduce fragmentation of MAGs? I have tried using the metaWRAP bin reassemble module, but the results show only marginal improvements.

Thanks, FZ

aemiol commented 5 years ago

The most important factor here is coverage. It appears your MAGs have low coverage in your samples. Why not run a test using the complete dataset without subsampling?

Cheers, Tunde

franciscozorrilla commented 4 years ago

Thanks for the input! I have previously run GRiD on the MAGs I generated using the entire samples (i.e. no subsetting) and I was able to calculate GRiD values for approximately 1300/4000 MAGs. I have since refined the binning process, which may hopefully result in better coverage for some bins, although I suspect not by much.

I assume that the ~1300 MAGs for which I was able to generate GRiD values correspond to high abundance gut microbes, and I am failing to obtain GRiD values for lesser abundant ones? In that case would the only solution be to sequence samples at a greater depth?

Thanks, FZ

aemiol commented 4 years ago

Hi, Yes you are right. Coverage is the main issue in your case