grubaughlab / DENV_pipeline

GNU General Public License v3.0
15 stars 2 forks source link

How to interpret sylvatic call alongside standard serotypes? #20

Closed ammaraziz closed 4 months ago

ammaraziz commented 5 months ago

Hi there,

Thanks again for the amazing amplicon scheme and pipeline.

I am getting results that look like this:

sample_id   consensus_sequence_file depth   serotype_called reference_sequence_name reference_sequence_length   number_aligned_bases    coverage_untrimmed  coverage_trimmed
SAMPLEID_S8 SAMPLEID_S8.DENV2.10.cons.fa    10  DENV2   DENV2   10723   10512   98.03   100
SAMPLEID_S8 SAMPLEID_S8.DENV2_sylvatic.10.cons.fa   10  DENV2_sylvatic  DENV2_sylvatic  10722   5918    55.19   55.14
SAMPLEID_S8 SAMPLEID_S8.DENV3.10.cons.fa    10  NA  DENV3   10707   697 6.51    6.34
SAMPLEID_S8 SAMPLEID_S8.DENV4.10.cons.fa    10  NA  DENV4   10649   72  0.68    0.71

I'm unsure how to handle the sylvatic result of ~55% genome coverage alongside a 100% DENV2 result. Can I assume the genome of a DENV2-sylvatic has a high degree of sequence identity to a DENV2 genome and therefore it's to be expected that I see the above result?

I have a few other samples also analysed alongside the above sample which show sylvatic genome coverage between ~20-55%. Some have no hits against the sylvatic genome.

Our lab rarely sequences dengue viruses, so I don't think it's environmental contamination (also our negs are clean). We perform PCR to ID the dengue serotype using both panDengue and DENV2 (whether DENV2 is specific to just DENV2 and not DENV2-sylvatic I am unsure).

Any advice would be appreciated.

Thanks,

Ammar

ammaraziz commented 5 months ago

I forgot to mention, the samples are from an island nation in the the pacific ocean. I don't think the sylvatic strains have ever been observed in those countries and I don't even know if the natural host (monkeys) live on the island.

As you can tell my dengue knowledge is very basic!

ViralVerity commented 5 months ago

Hi Ammar,

You've actually said the right thing in here - it's because the sylvatics are pretty similar to the endemic viruses. You can also see that there's a low percentage of coverage with different serotypes as well for the same reason. It's just that the sylvatics are close enough that they can get above the 50% threshold for reporting.

Just take the serotype with the higher coverage, in this case DENV-2! It's not a contamination issue or anything you're doing wrong, it's just because the pipeline is not currently very clever in the way it compares to different serotypes

Let me know if any of that doesn't make sense

ammaraziz commented 4 months ago

Apologies for the super late response.

Thank for sharing your knowledge and expertise. All of that makes sense.

Closing with a thanks!