harry-thorpe / piggy

Pipeline for analysing intergenic regions in bacteria
GNU General Public License v3.0
37 stars 7 forks source link

failed to align IGR cluster files #21

Closed asaksager closed 5 years ago

asaksager commented 5 years ago

Hello

I'm encountering the Error "failed to align IGR cluster files", when I try to use tool on around ~50 gff files, created from prokka and roary.

I have run roary as follows: $roary -s -e -n -o roary_pangenome -f Roary Annotations/Genus_gff_files/*

I am using the following versions: perl/5.24.0 intel/redist/2019_update2 intel/compiler/64 R/3.5.0 mafft/7.402 ncbi-blast/2.8.1+ bedtools/2.27.1 mcl/14-137 parallel/20190122 prank/140603 fasttree/2.1.9 cd-hit/4.8.1 piggy/1.4

$piggy -i Annotations/Genus_gff_files/ -r Roary/ -o pan_igr_genomes/

I noticed that in line 447 in piggy there is no citation marks around mafft. Do you have any idea to what I might be doing wrong?

Thanks in advance

harry-thorpe commented 5 years ago

It doesn't look like you are doing anything wrong. Could you try adding the quotes around mafft to see if this fixes it?

Are there any files in 'cluster_intergenic_files' with the ending '_aligned_tmp.fasta'?

asaksager commented 5 years ago

I'll try, but i'm working on a remote computer where I don't have full access.

Yes there is one; Cluster_1_alignedtmp.fasta The rest is cluster#.fasta

harry-thorpe commented 5 years ago

OK, does it contain any data? And does the corresponding file without '_aligned_tmp' in the filename look OK?

I think this must be an issue with either parallel, or mafft, or the input fasta files.

asaksager commented 5 years ago

The Cluster_1_aligned _tmp.fasta file does not contain any data. It has not improved to add citation around mafft. However, on my local computer i did succed in aligning the clusters (but then failed at: Detecting candidate switched IGRs... Cannot open output file: pan_igr_genomes/switched_region_files/*oadG3+_+_group266+_+_Cluster3115+_+_Cluster_732.fasta failed to detect candidate switched IGRs)

The only difference is (i believe) the different versions of Mafft and parallel, where on my local it is respectively mafft-7.427 and parallel-20150322

harry-thorpe commented 5 years ago

Sorry about this. I am currently putting piggy on conda, so this should solve the version issues.

As for the output file: *oadG_3++group_266++Cluster3115++_Cluster732.fasta, the file name is really malformed. All the delimiters should be ++, and it looks like oadG_3 is missing something from the beginning.

What happens if you search for oadG_3 in the roary gene_presence_absence.csv file?

If you could send me a small reproducible example with gffs (maybe 10 isolates or so) that would really help (if you can share your data).

asaksager commented 5 years ago

There is a oadG_3 gene in gene_presence_absence.csv, and I ran it again and now it seems to be a different gene: pan_igr_genomes/switched_regionfiles/*adiC+_+_gadX1+_+_Cluster1771+_+_Cluster_295.fasta which is also there: "adiC","","Arginine/agmatine antiporter","46","46"...

Yes where can I send it to?

harry-thorpe commented 5 years ago

Can you compress and upload here?

asaksager commented 5 years ago

Roary.tar.gz Copy_gff.tar.gz

harry-thorpe commented 5 years ago

Hi, the Roary directory looks fine, but the gff one is empty

asaksager commented 5 years ago

Sorry about that Copy_gff.tar.gz

harry-thorpe commented 5 years ago

Thanks, I have just tested these files and it ran OK on my system. Could you try installing Roary with conda in a clean environment, and then cloning piggy again (the piggy executable is now in piggy/bin). This should mean all the dependencies are the right version etc. If this still doesn't work then I'll look into it further.

Just to let you know, the R script throws up some errors for me (this is right at the end):

Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Calls: rbind -> rbind Execution halted

I am working on cleaning this up.

asaksager commented 5 years ago

Hello again Just a short update. We finally got results on one of the systems, even though we never really figured out what caused the problem. We are now very excited to look more closely at the final results!

harry-thorpe commented 5 years ago

OK that's great. Hopefully they are useful!