This update means piranha is now permissive of multi-mapping/ chimeric reads
because of this, the default QC cut offs needed to be changed. minimum alignment quality is now 0 because multimapped reads will have a score of 0. Reads will all be mapped again to a single reference later in the pipeline, so false hits will be avoided, however the read count will now be inflated.
This means that the background database can be as fleshed out as possible, and reads that map to multiple references will have the top hit used as the actual hit. Multimappings are reported in a temp output file for the moment, so for dev purposes check this output file in no-temp mode to investigate what references are causing reads to be multimapping
The top hit for a given ddns_group (or whatever grouping) will be taken forward for a single consensus generation. This means that even if there are two distinct wpv1 populations, only a single reference per WPV1 group will be chosen. If the user wishes to extract multiple WPV1 sequences, they should use a different reference group labelling (e.g. lineage) and piranha will produce a single consensus per lineage detected in the run.
Some other updates adding in KEY replacement of strings in the preprocessing.py script
Also alternative method of parsing the paf file, where we essentially groupby the read name and look at multihits. Hits are stored as sets per reference for effiency and now a set union merges into reference group. Hits are reported per reference group, with the chosen top reference taken forward, however in the hit report file the original reference hit in the db is still reported alongside the other information.
Notes