snayfach / MAGpurify

Improvement of metagenome-assembled genomes
GNU General Public License v3.0
47 stars 12 forks source link

Add coverage module #10

Closed apcamargo closed 4 years ago

apcamargo commented 4 years ago

This PR adds the coverage module to MAGpurify.

Changes

outliers = ((mag_coverage_df / mag_coverage_df.mean()) >= args["max_deviation"]) | (
    (mag_coverage_df / mag_coverage_df.mean()) <= 1 / args["max_deviation"]
)
apcamargo commented 4 years ago

@snayfach I listed three points that I'd like to discuss before merging the PR (indicated with the [Up for discussion] in the post above).

  1. I'm not sure if using just a single sample for outlier detection is the best option. Useful information may be ignored. I choose to do this because it seemed to me that you only used a single sample in the paper. Is that correct?
  2. I set the sample minimum average coverage to 1.0. If the mean coverage is less than that, outlier detection doesn't happen. What do you think?
  3. Is the outlier detection implemented in the same way you did for the paper?
snayfach commented 4 years ago

Re #1 - Yes I only used a single MAG from a single sample for outlier detection. Using multiple samples and looking at covariation of coverage seems considerably more difficult. Would this require having a co-assembly?

Re #2 - A minimum of 1x coverage seems reasonable. This cutoff might be explored a bit more in a paper. Also the deviation from the mean/median might depend on the mean/median. Higher deviation might be expected for bins with lower average coverage.

Re #3 - Yes this is the same way I defined an outlier

apcamargo commented 4 years ago

Re #1 - Yes I only used a single MAG from a single sample for outlier detection. Using multiple samples and looking at covariation of coverage seems considerably more difficult. Would this require having a co-assembly?

Not necessarily. You can map reads from related samples (eg.: replicates) to your MAG.

I finished writing the README and did some testing. Everything seems to be ok. If you fine with the changes, the PR can be merged.