nextstrain / measles

Nextstrain build for measles virus
https://nextstrain.org/measles
0 stars 6 forks source link

Clade assignment using whole genome - reference tree inconsistency with reference fasta file #40

Closed mariaelf97 closed 1 week ago

mariaelf97 commented 1 week ago

Current Behavior

Hello! Thank you for the great tool. I am trying to assign clades to bunch of measles whole genome assemblies I have but it tells me that the reference sequence is inconsistent with the input tree used.

Expected behavior

I was expecting to get an output csv file similar to the one the GUI produces that gives me information about the clade.

How to reproduce

Steps to reproduce the current behavior:

  1. reference file used : https://www.ncbi.nlm.nih.gov/nuccore/NC_001498.1
  2. tree downloaded in JSON format from here: https://nextstrain.org/measles/genome
  3. command : nextclade run sequences.fasta --input-ref reference.fasta --input-tree measles_genome.json --output-csv trial.csv
  4. See error
    Error: 
    0: When preprocessing Nextclade graph
    1: When retrieving nuc mutations from reference tree node NODE_0000397
    2: Encountered a mutation (T518C) in reference tree branch attributes, for which the origin state of the mutation is inconsistent with the state at the parental branch. Mutations origin state is 'T', but tree (inferred from the reference sequence as no prior mutations were observed at this position) has state 'C'. This is likely an inconsistency between reference tree and reference sequence in the Nextclade dataset. Reference sequence should either correspond to the root of the reference tree or the root of the reference tree needs to account for difference between the tree and reference sequence. Please check that your reference tree is consistent with your reference sequence.

Looks like you used a different reference to build the tree. I am not sure which reference I should used. I pulled the reference ID from your GitHub.

joverlee521 commented 1 week ago

Hi @mariaelf97,

Have you tried to use the Nextclade dataset for measles? Even though the dataset is focused on the N450 region, it can still be used to assign clades to the full genome

  1. Fetch the existing dataset with

    nextclade dataset get \
    --name "nextstrain/measles/N450/WHO-2012" \
    --output-zip measles.zip
  2. Run Nextclade with the dataset and your sequences

    nextclade run \
    sequences.fasta \
    --input-dataset measles.zip \
    --output-csv trial.csv
mariaelf97 commented 1 week ago

@joverlee521 Thank you for your response. Does that mean that clade assignment cannot be done using whole genomes?

joverlee521 commented 1 week ago

@mariaelf97 No, it is not possible with the current whole genome tree because it does not have internal clade labels which is required for Nextclade to assign clades.

You can use the existing Nextclade dataset for clade labeling on your whole genomes, see more explanation about the dataset in the README

mariaelf97 commented 1 week ago

okay thank you!