mbhall88 / head_to_head_pipeline

Snakemake pipelines to run the analysis for the Illumina vs. Nanopore comparison.
GNU General Public License v3.0
5 stars 2 forks source link

Are all repeats samples clustered together #69

Closed mbhall88 closed 3 years ago

mbhall88 commented 3 years ago

In the Malagasy dataset, there are a number of cases where the patient was sampled again a few months after the initial sample. In theory, these samples should have almost no SNP differences. In reality though, what do we find?

mbhall88 commented 3 years ago

Not all repeat pairs had both samples make it through the QC step, so we can only check those that did

Sample1 Sample2 COMPASS dist. bcftools dist.
mada_1-46 mada_2-46 0 0
mada_1-53 mada_2-53 1 1
mada_1-1 mada_2-1 1766 1217
mada_1-25 mada_2-25 0 0
mada_1-50 mada_2-50 195 178
mbhall88 commented 3 years ago

Here is the lineage information for the two outlier pairs

sample    lineage
mada_1-1    1.1.2
mada_2-1    4.10
mada_1-50   4.10
mada_2-50   4.10

It seems plausible that mada_1-1 was reinfected with a completely different strain? Not sure about how to explain mada_1-50? I'll send this through to Simon and get his thoughts.

simongrandjeanlapierre commented 3 years ago

Not all repeat pairs had both samples make it through the QC step, so we can only check those that did

Sample1 Sample2 COMPASS dist. bcftools dist. mada_1-46 mada_2-46 0 0 mada_1-53 mada_2-53 1 1 mada_1-1 mada_2-1 1766 1217 mada_1-25 mada_2-25 0 0 mada_1-50 mada_2-50 195 178

RE: 1-1 & 2-1, I double checked, and the ID is different. Those were believed to be repeats but are not! Please don't treat them as such anymore for the analysis. Let's not relabel them though because this will lead to significant confusion.

RE: 1-50 & 2-50, I double checked. It's the same patient, same hospital, 5 months appart. She was initially smear 3+ and became smear negative. Both isolates have the same MDR profile. in a low MDR prevalence setting, re-infection with a different strain is unlikely. Points towards-something fishy on the sequencing side.

mbhall88 commented 3 years ago

RE: 1-1 & 2-1, I double checked, and the ID is different. Those were believed to be repeats but are not! Please don't treat them as such anymore for the analysis. Let's not relabel them though because this will lead to significant confusion.

Ok, I will leave the labels as they are.

RE: 1-50 & 2-50, I double checked. It's the same patient, same hospital, 5 months appart. She was initially smear 3+ and became smear negative. Both isolates have the same MDR profile. in a low MDR prevalence setting, re-infection with a different strain is unlikely. Points towards-something fishy on the sequencing side.

Alright, when we get into the DST part of the project we can check whether the resistance profiles are the same or not and that should shed a little more light.

simongrandjeanlapierre commented 3 years ago

On the data we have / from the MyKrobe local version installed in IPM. The DST profiles of 1-50 and 2-50. Are a perfect match. Simon

From: Michael Hall @.> Reply-To: mbhall88/head_to_head_pipeline @.> Date: Wednesday, March 17, 2021 at 9:03 PM To: mbhall88/head_to_head_pipeline @.> Cc: Simon TB Lab @.>, Comment @.***> Subject: Re: [mbhall88/head_to_head_pipeline] Are all repeats samples clustered together (#69)

RE: 1-1 & 2-1, I double checked, and the ID is different. Those were believed to be repeats but are not! Please don't treat them as such anymore for the analysis. Let's not relabel them though because this will lead to significant confusion.

Ok, I will leave the labels as they are.

RE: 1-50 & 2-50, I double checked. It's the same patient, same hospital, 5 months appart. She was initially smear 3+ and became smear negative. Both isolates have the same MDR profile. in a low MDR prevalence setting, re-infection with a different strain is unlikely. Points towards-something fishy on the sequencing side.

Alright, when we get into the DST part of the project we can check whether the resistance profiles are the same or not and that should shed a little more light.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mbhall88/head_to_head_pipeline/issues/69#issuecomment-801539729, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOYKECHO5CGNBU22OLRFIF3TEFGPXANCNFSM4ZIFAXYQ.

mbhall88 commented 3 years ago

RE: 1-50 & 2-50, I double checked. It's the same patient, same hospital, 5 months appart. She was initially smear 3+ and became smear negative. Both isolates have the same MDR profile. in a low MDR prevalence setting, re-infection with a different strain is unlikely. Points towards-something fishy on the sequencing side.

@simongrandjeanlapierre Alright, looking at the DST results, it seems like there is an issue with 1-50. The culture-based DST for 1-50 and 2-50 are the same - Resistant to Isoniazid and Rif.. Mykrobe Illumina and Nanopore results for 2-50 are Resistant to Ison. and Rif., but also to Ethambutol - and the COMPASS and bcftools VCFs back up those variants. 1-50 however is called susceptible for everything by mykrobe on both Illumina and mykrobe and this is backed up by the COMPASS and bcftools VCFs.

Given both the Illumina and Nanopore results for 1-50 are the same, it seems likely the swap might have happened before the sequencing (if there is a swap...)

simongrandjeanlapierre commented 3 years ago

I have the same feeling. Two hypotheses then (correct me if I'm wrong)

  1. 1-50 was mixed up after phenotyping but before genotyping (most likely)
  2. 1-50 is falsely resistant to INH/RIF on phenotype and the strain became resistant to INH/RIF/EMB during 5 months of sub-optimally observed therapy (highly unlikely)

I looked back at the database and unfortunately we don't have good locally (mada) generated Nanopore data for 1-50. This would have allowed to compare genomes and confirm that a swap happened between Mada and the UK.

Can you compare those genomes (1-50 and 2-50) besides the drug-resistant genes. If there are numerous SNP differences we can confidently affirm that a swap has happened.

Similarly, can you look for a highly similar genome to 2-50 in the rest of the dataset so we can look check if two samples sould have been inverted?

mbhall88 commented 3 years ago

I suspect option 1 is the far more likely - it is bound to happen when dealing with so many samples.

The SNP differences between 1-50 and 2-50 are listed in https://github.com/mbhall88/head_to_head_pipeline/issues/69#issuecomment-800103126

simongrandjeanlapierre commented 3 years ago

Super. Agreed with your hypotheses.

 *   / 2-1 likely re-infection with different strain.

1-50 / 2-50 likely lab mishandling of isolate.

Simon

From: Michael Hall @.> Reply-To: mbhall88/head_to_head_pipeline @.> Date: Monday, July 12, 2021 at 6:57 PM To: mbhall88/head_to_head_pipeline @.> Cc: Simon TB Lab @.>, Mention @.***> Subject: Re: [mbhall88/head_to_head_pipeline] Are all repeats samples clustered together (#69)

I suspect option 1 is the far more likely - it is bound to happend when dealing with so many samples.

The SNP differences between 1-50 and 2-50 are listed in #69 (comment)https://github.com/mbhall88/head_to_head_pipeline/issues/69#issuecomment-800103126

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mbhall88/head_to_head_pipeline/issues/69#issuecomment-878650789, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOYKECGEVMCU4A2IVLB4EY3TXNXPHANCNFSM4ZIFAXYQ.