wtmatlock / flanker

Gene-flank analysis tool
MIT License
25 stars 6 forks source link

Default flank behaviour for reverse complemented sequences #47

Closed LeahRoberts closed 3 years ago

LeahRoberts commented 3 years ago

Hi all,

Thanks for the tool, very useful (and timely!) for my current project. Unfortunately I've run into a problem that I was hoping you could help me with.

I have a set of outbreak plasmids that are basically identical. However, when I ran Flanker it told me I had different flanking regions. It looked like an issue based on some of the plasmid sequences being reverse complemented, so I did some tests:

flanker -i NDM_plasmids/cpe059_61_58.plasmids.fasta -cl -g blaNDM-1 -circ -f upstream -w 5000 -o flanker_test_up

assembly_1,cluster
cpe058_contig_2_np1212.fasta_blaNDM-1_5000_upstream_flank.fasta,0
cpe061_contig_2_np1212.fasta_blaNDM-1_5000_upstream_flank.fasta,0
cpe059_contig_1_np1212.fasta_blaNDM-1_5000_upstream_flank.fasta,0

flanker -i NDM_plasmids/cpe059_61_58.plasmids.fasta -cl -g blaNDM-1 -circ -f downstream -w 5000 -o flanker_test_down

assembly_1,cluster
cpe059_contig_1_np1212.fasta_blaNDM-1_5000_downstream_flank.fasta,0
cpe058_contig_2_np1212.fasta_blaNDM-1_5000_downstream_flank.fasta,0
cpe061_contig_2_np1212.fasta_blaNDM-1_5000_downstream_flank.fasta,0

flanker -i NDM_plasmids/cpe059_61_58.plasmids.fasta -cl -g blaNDM-1 -circ -w 5000 -o flanker_test_both

assembly_1,cluster
cpe061_contig_2_np1212.fasta_blaNDM-1_5000_both_flank.fasta,0
cpe059_contig_1_np1212.fasta_blaNDM-1_5000_both_flank.fasta,0
cpe058_contig_2_np1212.fasta_blaNDM-1_5000_both_flank.fasta,1

It looks like when using the default -f ('both') it's treating the reverse complemented sequence differently. I was wondering if it had something to do with these lines (is it changing args.flank from 'both' to 'upstream' for reverse complemented genes?): https://github.com/wtmatlock/flanker/blob/cc823406b5098266ce4251826e6bf283bc17fc35/flanker/flanker.py#L372-L378

Thanks for your help 🙏

samlipworth commented 3 years ago

Hi Leah, Thanks for raising this. Any chance you could provide us with the sequences (or even just part of them e.g. the flanks so that we can reproduce this). We promise to delete as soon as issue resolved. If you're not comfortable with this then if you could find something on e.g. ENA that recreates the same issue that would be immensely helpful.

LeahRoberts commented 3 years ago

Thanks Sam, I've sent you an email with the sequences 👍

samlipworth commented 3 years ago

Hi Leah,

Thanks for such raising such a detailed issue (and working out the problem) - the repro example was super helpful too. Sorry for being slow to get back to you, I have been on holiday.

You are right I think, in the paper we didn't really use the both mode and so it is not as extensively tested but if you insert e.g: elif args.flank =='both': x = 'both' to line 377 then I think this solves your problem (certainly all flanks are grouped together when I run your example. I will push this now (but please note you will have to re-install). Please do let me know if it doesn't look like this solves your problem/if there is anything else we can help with.

Best, Sam

LeahRoberts commented 3 years ago

That's great thanks Sam - and sorry to bother you on your holiday!