nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
https://clades.nextstrain.org
MIT License
214 stars 58 forks source link

web-based nextclade issue when using another reference #1364

Open yl315504 opened 9 months ago

yl315504 commented 9 months ago

Hello,

I was trying to use a mouse covid reference, called MA10. https://www.ncbi.nlm.nih.gov/nuccore/1898953378

I uploaded the fasta file as the new reference. Should I use fasta file as the reference?

image

I got the following error. Could you please help?

Thanks,

Error message: Error: When initializing Nextclade runner: When parsing reference tree Auspice JSON v2: When parsing Auspice Tree JSON contents: When parsing JSON: expected value at line 1 column 1

Nextclade version 2.14.1 (commit: 85e00e8, branch: release)

Memory available: 3586 MBytes

User agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36

Browser details: {"browser":{"name":"Chrome","version":"119.0.0.0"},"os":{"name":"Windows","version":"NT 10.0","versionName":"10"},"platform":{"type":"desktop"},"engine":{"name":"Blink"}}

Call stack:

Error: When initializing Nextclade runner: When parsing reference tree Auspice JSON v2: When parsing Auspice Tree JSON contents: When parsing JSON: expected value at line 1 column 1 at M (https://clades.nextstrain.org/_next/static/chunks/68.daad1778960b5734.js:1:15774) at https://clades.nextstrain.org/_next/static/wasm/cf835b4f9fbcf56c.wasm:wasm-function[1547]:0x1e3d5c at new a (https://clades.nextstrain.org/_next/static/chunks/68.daad1778960b5734.js:1:9424) at https://clades.nextstrain.org/_next/static/chunks/68.daad1778960b5734.js:1:24056 at u (https://clades.nextstrain.org/_next/static/chunks/444.170c49d0571b2ba1.js:1:28086) at Generator._invoke (https://clades.nextstrain.org/_next/static/chunks/444.170c49d0571b2ba1.js:1:29376) at a. [as next] (https://clades.nextstrain.org/_next/static/chunks/444.170c49d0571b2ba1.js:1:28489) at p (https://clades.nextstrain.org/_next/static/chunks/68.daad1778960b5734.js:1:21841) at d (https://clades.nextstrain.org/_next/static/chunks/68.daad1778960b5734.js:1:22038) at https://clades.nextstrain.org/_next/static/chunks/68.daad1778960b5734.js:1:22097

corneliusroemer commented 9 months ago

Hi @yl315504! Unfortunately you can't just swap out a reference when the dataset contains a tree that is based on a particular reference. If the genemap is different that might also cause issues - unfortunately this isn't as simple as plugging in your own reference.

What's the reason you'd like to use this particular reference? There might be another way to achieve your goal.

If you just want to align against that reference, you can use nextalign CLI.

ivan-aksamentov commented 9 months ago

Dear @yl315504,

On the screenshot, I see you have a red circle with "1" inside of it, near "Customize dataset files". This means you've provided some custom dataset files under that section (the section can be opened and closed).

From the error, saying that Nextclade failed to parse Auspice JSON, I hypothesize that you've provided a reference tree file there which is not a correctly formatted Auspice JSON tree file. This means that you either need to provide a correctly formatted reference tree file or to remove it (in which case the default one will be used, from the SC2 dataset).

The part you've encircled with a red line is the list of files which Nextclade will be analyzing (we call them "query sequences").

Note that each dataset is tailored towards a particular pathogen strain. In general, you cannot just swap one component and expect everything to work as usual. For example, as a general rule, reference sequence must be a root of the reference tree (there are workarounds, but they are quite advanced). So only slight customizations are possible. I don't believe analyzing viruses from different host organisms are possible with our human SC2 dataset, unless the strains are very close to Wuhan strain or other pandemic strains. You can always create your own whole dataset though.

So all this looks very confusing to me. I invite you to read Nextclade documentation (the "Docs" link on the top panel) It's important to understand how Nextclade works and how to configure it before using it, and especially before using its advanced features. It's not the best documentation in the world, but it might worth to give it a shot. If you still have questions after reading documentation, we'll try to answer.

Please explain better what you are trying to achieve and we'll try to help.

yl315504 commented 9 months ago

Thanks for your reply. Our goal is to explore if our sequences have new mutations against the mouse covid MA-10. Is there a way to do this using the web-based nextclade?

Thanks, Yan Liu UT Southwestern Medical Center

On Wed, Dec 20, 2023 at 3:11 PM Cornelius Roemer @.***> wrote:

Hi @yl315504 https://github.com/yl315504! Unfortunately you can't just swap out a reference when the dataset contains a tree that is based on a particular reference. If the genemap is different that might also cause issues - unfortunately this isn't as simple as plugging in your own reference.

What's the reason you'd like to use this particular reference? There might be another way to achieve your goal.

If you just want to align against that reference, you can use nextalign CLI.

— Reply to this email directly, view it on GitHub https://github.com/nextstrain/nextclade/issues/1364#issuecomment-1865143497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCZ2OD6OJ5HVJDKXHYL4RTYKNH7TAVCNFSM6AAAAABA5MU7I2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRVGE2DGNBZG4 . You are receiving this because you were mentioned.Message ID: @.***>

--

Thanks,

Yan Liu

corneliusroemer commented 9 months ago

Thanks for explaining your goal! It's possible right now but not easy - but we might be able to implement something to make this easier, I'll break it out into a feature request.

Right now, I don't think we have the option to start with a "blank" dataset and add your own files onto it.

Currently, it's a bit tricky to do this. One way to achieve this would be to create your own dataset: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html

You might be best off running Nextclade v3 CLI like this:

nextclade3 run --input-ref mouse-seq.fasta --output-tsv results.tsv input.fasta

This is much easier than creating a dataset. You could look at the output tsv in Excel and see whether there are any mutations.

I'll think about how to make things easier for this use case. Thanks for reaching out in any case, it's always useful to know what end users struggle with and what their use cases are!

yl315504 commented 9 months ago

Thanks!

I will use Nextclade v3 CLI.

nextclade3 run --input-ref mouse-seq.fasta --output-tsv results.tsv input.fasta

Yan Liu

On Wed, Dec 20, 2023 at 3:37 PM Cornelius Roemer @.***> wrote:

Thanks for explaining your goal! It's possible right now but not easy - but we might be able to implement something to make this easier, I'll break it out into a feature request.

Right now, I don't think we have the option to start with a "blank" dataset and add your own files onto it.

Currently, it's a bit tricky to do this. One way to achieve this would be to create your own dataset: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html

You might be best off running Nextclade v3 CLI like this:

nextclade3 run --input-ref mouse-seq.fasta --output-tsv results.tsv input.fasta

This is much easier than creating a dataset. You could look at the output tsv in Excel and see whether there are any mutations.

I'll think about how to make things easier for this use case. Thanks for reaching out in any case, it's always useful to know what end users struggle with and what their use cases are!

— Reply to this email directly, view it on GitHub https://github.com/nextstrain/nextclade/issues/1364#issuecomment-1865176901, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCZ2OA7LGL4QZN6OD37OHDYKNLARAVCNFSM6AAAAABA5MU7I2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRVGE3TMOJQGE . You are receiving this because you were mentioned.Message ID: @.***>

--

Thanks,

Yan Liu

corneliusroemer commented 9 months ago

@yl315504 Nextclade v3 is not yet conda-installable, so you need to download the binary from here: https://github.com/nextstrain/nextclade/releases/tag/3.0.0-alpha.1

However, as an early christmas present for you (I hope) I just implemented a basic dataset that you can drop your own reference into (the thing I suggested earlier we could possibly do).

This is very experimental and might not work for very long, but you can give it a try here: https://nextclade-git-cds-error-nextstrain.vercel.app/?dataset-server=gh:@scratch@&dataset-name=nextstrain/scratch/reference-only

You need to "customize" the dataset as shown in the video, but otherwise it should work now (in contrast to your earlier attempts!): 2023-12-21 00 03 16

Would love to know if this works and if it does what you need it to do.

yl315504 commented 9 months ago

It works! This is the best Christmas gift!

Yan Liu

On Wed, Dec 20, 2023 at 5:07 PM Cornelius Roemer @.***> wrote:

@yl315504 https://github.com/yl315504 Nextclade v3 is not yet conda-installable, so you need to download the binary from here: https://github.com/nextstrain/nextclade/releases/tag/3.0.0-alpha.1

However, as an early christmas present for you (I hope) I just implemented a basic dataset that you can drop your own reference into (the thing I suggested earlier we could possibly do).

This is very experimental and might not work for very long, but you can give it a try here:

@.***@&dataset-name=nextstrain/scratch/reference-only

You need to "customize" the dataset as shown in the video, but otherwise it should work now (in contrast to your earlier attempts!): 2023-12-21.00.03.16.gif (view on web) https://github.com/nextstrain/nextclade/assets/25161793/2a4da92e-b9bc-4386-af27-b0bb4e0463c3

Would love to know if this works and if it does what you need it to do.

— Reply to this email directly, view it on GitHub https://github.com/nextstrain/nextclade/issues/1364#issuecomment-1865258135, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCZ2OCXPDTTAA7LCZBMPLLYKNVS3AVCNFSM6AAAAABA5MU7I2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRVGI2TQMJTGU . You are receiving this because you were mentioned.Message ID: @.***>

--

Thanks,

Yan Liu

corneliusroemer commented 9 months ago

Excellent! Right now the URL you need to use is quite ugly, we might make this a normal dataset so it's easier to select in the future.

If you have other ideas/use cases/requests, let us know and we can see whether it's possible.