thomasp85 / FindMyFriends

Fast alignment-free pangenome creation and exploration
27 stars 6 forks source link

importing from Prokka works (I think) - but not sure how to export FindMyFriends to PanViz #13

Open devonorourke opened 6 years ago

devonorourke commented 6 years ago

Hi there, I sent out a tweet a few hours ago (https://twitter.com/thesciencedork/status/1012398574582353920) wondering if there was a way that your program could seamlessly work with files annotated with Prokka. I am certain there are 1000 more elegant ways of doing it, but the following scripts ultimately worked to import data and run the FindMyFriends R package.

What I'm still unclear about is how to export the pangenome object after running FindMyFriends into the PanViz tool. Is there an export function I'm missing from your detailed manual? Cheers, Devon

You need two input files for this script to work: the .gff and .faa files exported from Prokka. Start by converting the .gff to column-separated files:

for i in $(find . -name "*.gff"); do j=$(basename $i | cut -d '.' -f 1);  grep "Prodigal" "$j".gff | cut -f 1,4,5,7 | sed -e 's/\+/1/g;s/\-/\-1/g' > "$j".tmp2; done

Next convert .faa files into column-separated file

for i in $(find . -name "*.faa"); do j=$(basename $i | cut -d '.' -f 1); awk '/^>/ { print (NR==1 ? "" : RS) $0; next } { printf "%s", $0 } END { printf RS }' "$j".faa | sed 's/ .*$//g' | paste - - | sed 's/>//g' > "$j".tmp1; done

Finally we can paste together the tmp files, then rearrange and create final .fasta file needed for FindMyFamily importing:

for i in $(find . -name "*.faa"); do j=$(basename $i | cut -d '.' -f 1); paste "$j".tmp1 "$j".tmp2 | awk '{ print ">"$3"_"$1" # "$4" # "$5" # "$6"\n"$2 }' > "$j".fasta; done