polio-nanopore / piranha

GNU General Public License v3.0
16 stars 4 forks source link

Configuring with custom references - PanEV #123

Closed CatherineTroman closed 7 months ago

CatherineTroman commented 1 year ago

I tried running the pipeline using data from some PanEV (whole capsid) sequencing with the aim to look at what NPEVs were present in the samples. The min/max sequence lengths were set to 3500 and 4500, and we used the reference file from the PanEV branch of realtime-polio.

First we had an error which seemed like it didn't like all the reference sequences having the same name, so we appended a number at the front of each to solve this. We then got an error which I have copied and attached as a text file along with the command we ran.

We weren't quite sure if it was still somehow related to the reference file or something else, and didn't have much luck solving it! Please let me know if you need any more info/files.

Thanks, Catherine

panev-piranha-error1.txt

aineniamh commented 1 year ago

Hey @CatherineTroman, if you can just point me to the data I can run some tests. The first issue was for sure because all the references had identical names, they have to have unique identifiers or minimap2 complains. I can add a check in for that so the error is more informative- I just hadn't thought to handle people putting in different references with identical names before, but i'll make a note to add in that check.

CatherineTroman commented 1 year ago

Hi Áine, I have added the fastq files and barcodes.csv to our shared dropbox in Method Development > LOD_Study > PanEV_seqrun. Please let me know if there are any issues finding/accessing it!

aineniamh commented 1 year ago

Thanks! Will do

aineniamh commented 1 year ago

Having a look at the reference file you're running (https://github.com/polio-nanopore/realtime-polio/blob/panEV/rampart/references.fasta I believe?), I think your issue comes down to the display names not being consistent with what we'd decided the display names should be in piranha. Currently piranha expects either WPV1, 2 or 3, Sabin1-like, 2-like or 3-like or NonPolioEV. I suspect using other display names is what's causing the error

https://github.com/polio-nanopore/piranha/blob/main/README.md#custom-reference-file

rmcolq commented 1 year ago

Currently the pipeline is constricted to only allow certain categories - I've added a commit which regularizes the display names when they don't quite conform to the categories. However if we want it to be possible to have more groups of NonPolioEV separated out in the reports, this is a larger task

aineniamh commented 7 months ago

I'm going to close this as it's been resolved- but note there is still need for consistent groupings in the references file. May put in a feature request for this to be configurable