replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 16 forks source link

Test Freyja Update function #270

Open hoelzer opened 1 month ago

hoelzer commented 1 month ago

Freyja was recently added and also has a update parameter. To test this new functionality, we can run poreCov with a sample that has a very recent lineage assignment which is not part of the Freyja version in the container (from March I think):

Recent designations:

https://github.com/cov-lineages/pango-designation/releases/tag/v1.27

We have a Run24-123 (17.05.2024) with samples

IMSSC2-174-2024-00065 - assigned JN 1.48.1 (designated 15.04.2024) IMSSC2-99-2024-00026 - assigned JN 1.16.2 (designated 15.04.2024) IMSSC2-63-2024-00057 - assigned KP 2.3 (designated 15.04.2024)

The data is on the HPC here:

<usual-path>/2024-05-17/*Run24-123/

hoelzer commented 1 month ago

In this context, the help mssg

--screen_reads           Determines the Pangolineage of each individual read (takes time, needs --freyja and/or --lcs)

Is wrong. I am not 100% sure how LCS was doing this, but Freyja is not giving a lineage assignment for each read. Freyja is doing a deconvolution of the variant calls to estimate the proportion of lineages.

In this context, I am also wondering @DataSpott @replikation if we could not remove the screen_reads parameter?

Then, the user can just activate freyja or LCS with the parameter that is anyway available?

hoelzer commented 1 month ago

Alright, I tested this on patient samples.

here is the plot w/o freyja_update which then uses a Freyja database that is part of the container (from March 2024 or so)

grafik

here is the same data and command but with freyja_update and we can see that the update function does something (attention, the sample columns are not exactly the same)

grafik

Let's check the "interesting" samples (if I can see the colors correctly)

IMSSC2-174-2024-00065, barcode13 - assigned JN 1.48.1 (designated 15.04.2024)

IMSSC2-99-2024-00026, barcode 12 - assigned JN 1.16.2 (designated 15.04.2024)

IMSSC2-63-2024-00057, barcode 18 - assigned KP 2.3 (designated 15.04.2024)

Summary

This works very well! And the update function also does it's job. Still, I think we should implement a more robust way where the used freyja reference db is also versionized and stored. But this can be done in another issue/PR.

replikation commented 1 month ago

In this context, the help mssg

--screen_reads           Determines the Pangolineage of each individual read (takes time, needs --freyja and/or --lcs)

Is wrong. I am not 100% sure how LCS was doing this, but Freyja is not giving a lineage assignment for each read. Freyja is doing a deconvolution of the variant calls to estimate the proportion of lineages.

In this context, I am also wondering @DataSpott @replikation if we could not remove the screen_reads parameter?

Then, the user can just activate freyja or LCS with the parameter that is anyway available?

agree on this screen_reads part to simplify. but i prefer descriptions rather than tool names. (e.g. --screen_reads Uses freya to.....) maybe deprecate LCS?

Krannich479 commented 1 month ago

Amazing results @hoelzer !

hoelzer commented 1 month ago

In this context, the help mssg

--screen_reads           Determines the Pangolineage of each individual read (takes time, needs --freyja and/or --lcs)

Is wrong. I am not 100% sure how LCS was doing this, but Freyja is not giving a lineage assignment for each read. Freyja is doing a deconvolution of the variant calls to estimate the proportion of lineages. In this context, I am also wondering @DataSpott @replikation if we could not remove the screen_reads parameter? Then, the user can just activate freyja or LCS with the parameter that is anyway available?

agree on this screen_reads part to simplify. but i prefer descriptions rather than tool names. (e.g. --screen_reads Uses freya to.....) maybe deprecate LCA?

From my pov we could discontinue LCS. And then screen_reads would simply activate freyja.

@Krannich479 @MarieLataretu are you using LCS for anything? From my pov it's not well-updated and freyja would be anyway preferable

MarieLataretu commented 1 month ago

@Krannich479 @MarieLataretu are you using LCS for anything? From my pov it's not well-updated and freyja would be anyway preferable

We occasionally investigate the LCS results in special cases. But I agree that LCS is not up-to-date (even if we could update it)!

Maybe deprecation is a nice compromise? @replikation , it would be usable/available for some time, right?

replikation commented 1 month ago

yep

MarieLataretu commented 1 month ago

I'd vote for deprecating LCS, then. If we need LCS results in the near future for whatever reason, we'd still have the option to run it easily

hoelzer commented 1 month ago

Agree!

Krannich479 commented 1 month ago

Regarding your freja results, do you mind re-running freja if I error-correct you the read pools with herro? This insight would bolster my (dis-)trust in EC for variant calling.

hoelzer commented 1 month ago

@Krannich479 pleasure.

We could also then do a re-basecalling of the run with v5 Dorado SUP and see any differences.