np-core / nanopath

Python package and command line interface - entry point for the repository :snake:
Other
5 stars 0 forks source link

Microbial contamination, misassignments in Kraken report #11

Open esteinig opened 4 years ago

esteinig commented 4 years ago

Decontamination process on bacterial and viral data in KrakenProcess. Archaeal assignments are considered contamination, as well as singular read assignments. Contamination plot in app (#10) shows species level contaminants [commit 1b0212e].

esteinig commented 4 years ago

Page-wise contamination donut plots with limit of ten visible species for clarity.

esteinig commented 4 years ago

Think about better ways to determine contaminants. At the moment there is a fixed read threshold of minimum of 5 reads to consider as valid evidence for an assignment, but this is on test data and will not reflect complex real samples.

In addition, the septic shock paper showed that 18 reads may be considered evidence for the agent of infection, so need to figure out a better way to determine the thresholds.

@dn-ra any thought on this?

dn-ra commented 4 years ago

If it's a real contaminant rather than a misassignment there should be representation of the whole genome there. You could look for a full set of marker genes for each identified organism a la CheckM?

Would add compute time though

esteinig commented 4 years ago

That's a good idea! But most of the "metagenome" pipeline is for sepsis for now. How is your blood data looking - do you get sufficient reads that this might be feasible? I wonder if we will be able to pull out a significant representation of the genome.

Another thing that we will do is positive / negative controls barcoded onto the run after sufficient sample data has been collected to detect lab contamination.

esteinig commented 4 years ago

I think CheckM would be great for the assembly proportion of the pipeline when used for sputum or similar

dn-ra commented 4 years ago

The word from Lachlan is no - there's not enough coverage from blood samples. :( There may be enough variation to tell if it's contamination, but we would probably have to build our own statistical tests. Apparently some contamination that was showing up in blood was consistently from the one section of a chromosome.

esteinig commented 4 years ago

Right on, thanks for checking! I'll keep this open for now and see how the controls go. If we find unusual things and stuff that doesn't seem to make sense, we can think about this a bit more.