Closed brigidar closed 8 years ago
Brigida,
Could you send me the command that you are running? Are you giving LS-BSR a file of genes with “-g”, but then also using a clustering method?
thanks, Jason
On Jan 5, 2016, at 7:40 AM, brigidar notifications@github.com wrote:
Hi, I tried to run a comparison of two PHAST prediction regions with prodigal and vsearch and there is nothing in the consensus file, but if I use usearch there is vsearch is in the path and runs, but the output is empty Are the settings different for vsearch vs usearch for the cutoff? /home/brigidarusconi/vsearch/bin/vsearch LOG: 2016/01/04 17:57:37 - clustering with VSEARCH at an ID of 09, using 2 processors LOG: 2016/01/04 17:57:37 - VSEARCH clustering finished Best, Brigida
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8.
Hi Jason, I am running this job script on the server cluster. I made the ls_bsr.py executable and added it to the path.
ls_bsr.py -d ~/PHAST/PROKKA/B26_12292015/genomes/ -c vsearch exit 0 Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:26 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
Brigida,
Could you send me the command that you are running? Are you giving LS-BSR a file of genes with “-g”, but then also using a clustering method?
thanks, Jason
On Jan 5, 2016, at 7:40 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
Hi, I tried to run a comparison of two PHAST prediction regions with prodigal and vsearch and there is nothing in the consensus file, but if I use usearch there is vsearch is in the path and runs, but the output is empty Are the settings different for vsearch vs usearch for the cutoff? /home/brigidarusconi/vsearch/bin/vsearch LOG: 2016/01/04 17:57:37 - clustering with VSEARCH at an ID of 09, using 2 processors LOG: 2016/01/04 17:57:37 - VSEARCH clustering finished Best, Brigida
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169034343.
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
I am using two fasta files that have multiple regions predicted by PHAST in each of them. The one I ran with usearch I did directly in command line yesterday just to check and not in a job script. Might that be an issue? We are running the server cluster on SGE. Here is the output: total 1.2M drwxr-xr-x 2 brigida.rusconi 4 Jan 4 18:05 ./ drwxr-xr-x 3 brigida.rusconi 31 Jan 4 18:05 ../ -rw-r--r-- 1 brigida.rusconi 563K Jan 4 11:34 concatb26-1.fasta -rw-r--r-- 1 brigida.rusconi 509K Jan 4 11:34 concatb26-2.fasta Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:36 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169036821.
Brigida,
So what you are telling LS-BSR to do is to predict coding regions in each FASTA file, cluster them, then align the predicted regions back against each FASTA file in your “genomes” directory to determine the BSR. If you have predicted regions and want to determine their distribution across a set of genomes, you could do something like “-g concatb26-1.fasta -d genome_directory”. I currently don’t have a way to cluster a set of genes provided with the “-g” flag, but it’s something that’s on my list. Please let me know if I can clarify anyting else about how the method is working.
regards, Jason
On Jan 5, 2016, at 8:39 AM, brigidar notifications@github.com wrote:
I am using two fasta files that have multiple regions predicted by PHAST in each of them. The one I ran with usearch I did directly in command line yesterday just to check and not in a job script. Might that be an issue? We are running the server cluster on SGE. Here is the output: total 1.2M drwxr-xr-x 2 brigida.rusconi 4 Jan 4 18:05 ./ drwxr-xr-x 3 brigida.rusconi 31 Jan 4 18:05 ../ -rw-r--r-- 1 brigida.rusconi 563K Jan 4 11:34 concatb26-1.fasta -rw-r--r-- 1 brigida.rusconi 509K Jan 4 11:34 concatb26-2.fasta Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:36 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169036821.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169037615.
Hi Jason, The regions predicted span multiple genes and actually predict whole phages (30-60kb). I thought I could consider them like contigs. I thought it predicts the genes for each file and then cluster all of them together. Does it cluster by genome or all predicted proteins? I want to figure out how much the phage related mobilome differs between outbreak strains or other related infections. Since some of the proteins in phages are very similar I thought it would make more sense to do the de novo predicition and then cluster them so that I don’t have a lot of genes that are identical, but do not give me much information. I can also simply run it with the predicted genes that I got from prokka for all of the regions and then extract the variome. Was just curious to understand why it clusters with usearch, but not vsearch. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:45 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
Brigida,
So what you are telling LS-BSR to do is to predict coding regions in each FASTA file, cluster them, then align the predicted regions back against each FASTA file in your “genomes” directory to determine the BSR. If you have predicted regions and want to determine their distribution across a set of genomes, you could do something like “-g concatb26-1.fasta -d genome_directory”. I currently don’t have a way to cluster a set of genes provided with the “-g” flag, but it’s something that’s on my list. Please let me know if I can clarify anyting else about how the method is working.
regards, Jason
On Jan 5, 2016, at 8:39 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
I am using two fasta files that have multiple regions predicted by PHAST in each of them. The one I ran with usearch I did directly in command line yesterday just to check and not in a job script. Might that be an issue? We are running the server cluster on SGE. Here is the output: total 1.2M drwxr-xr-x 2 brigida.rusconi 4 Jan 4 18:05 ./ drwxr-xr-x 3 brigida.rusconi 31 Jan 4 18:05 ../ -rw-r--r-- 1 brigida.rusconi 563K Jan 4 11:34 concatb26-1.fasta -rw-r--r-- 1 brigida.rusconi 509K Jan 4 11:34 concatb26-2.fasta Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:36 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169036821.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169037615.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169039460.
Brigida,
I’m sure that the two methods work slightly different. You could also try to run cd-hit to see if that works. Let me know if I can help any further.
Jason
On Jan 5, 2016, at 8:53 AM, brigidar notifications@github.com wrote:
Hi Jason, The regions predicted span multiple genes and actually predict whole phages (30-60kb). I thought I could consider them like contigs. I thought it predicts the genes for each file and then cluster all of them together. Does it cluster by genome or all predicted proteins? I want to figure out how much the phage related mobilome differs between outbreak strains or other related infections. Since some of the proteins in phages are very similar I thought it would make more sense to do the de novo predicition and then cluster them so that I don’t have a lot of genes that are identical, but do not give me much information. I can also simply run it with the predicted genes that I got from prokka for all of the regions and then extract the variome. Was just curious to understand why it clusters with usearch, but not vsearch. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:45 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
Brigida,
So what you are telling LS-BSR to do is to predict coding regions in each FASTA file, cluster them, then align the predicted regions back against each FASTA file in your “genomes” directory to determine the BSR. If you have predicted regions and want to determine their distribution across a set of genomes, you could do something like “-g concatb26-1.fasta -d genome_directory”. I currently don’t have a way to cluster a set of genes provided with the “-g” flag, but it’s something that’s on my list. Please let me know if I can clarify anyting else about how the method is working.
regards, Jason
On Jan 5, 2016, at 8:39 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
I am using two fasta files that have multiple regions predicted by PHAST in each of them. The one I ran with usearch I did directly in command line yesterday just to check and not in a job script. Might that be an issue? We are running the server cluster on SGE. Here is the output: total 1.2M drwxr-xr-x 2 brigida.rusconi 4 Jan 4 18:05 ./ drwxr-xr-x 3 brigida.rusconi 31 Jan 4 18:05 ../ -rw-r--r-- 1 brigida.rusconi 563K Jan 4 11:34 concatb26-1.fasta -rw-r--r-- 1 brigida.rusconi 509K Jan 4 11:34 concatb26-2.fasta Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:36 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169036821.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169037615.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169039460.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169042052.
Hi Jason, I looked into the script and I don’t see the all_sorted.txt file created in the vsearch method (line 181). You only make the all_sorted.txt in the usearch but then in the run_vsearch you call the all_sorted.txt. I think if I read it correctly the input file is missing. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:57 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
Brigida,
I’m sure that the two methods work slightly different. You could also try to run cd-hit to see if that works. Let me know if I can help any further.
Jason
On Jan 5, 2016, at 8:53 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
Hi Jason, The regions predicted span multiple genes and actually predict whole phages (30-60kb). I thought I could consider them like contigs. I thought it predicts the genes for each file and then cluster all of them together. Does it cluster by genome or all predicted proteins? I want to figure out how much the phage related mobilome differs between outbreak strains or other related infections. Since some of the proteins in phages are very similar I thought it would make more sense to do the de novo predicition and then cluster them so that I don’t have a lot of genes that are identical, but do not give me much information. I can also simply run it with the predicted genes that I got from prokka for all of the regions and then extract the variome. Was just curious to understand why it clusters with usearch, but not vsearch. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:45 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Brigida,
So what you are telling LS-BSR to do is to predict coding regions in each FASTA file, cluster them, then align the predicted regions back against each FASTA file in your “genomes” directory to determine the BSR. If you have predicted regions and want to determine their distribution across a set of genomes, you could do something like “-g concatb26-1.fasta -d genome_directory”. I currently don’t have a way to cluster a set of genes provided with the “-g” flag, but it’s something that’s on my list. Please let me know if I can clarify anyting else about how the method is working.
regards, Jason
On Jan 5, 2016, at 8:39 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
I am using two fasta files that have multiple regions predicted by PHAST in each of them. The one I ran with usearch I did directly in command line yesterday just to check and not in a job script. Might that be an issue? We are running the server cluster on SGE. Here is the output: total 1.2M drwxr-xr-x 2 brigida.rusconi 4 Jan 4 18:05 ./ drwxr-xr-x 3 brigida.rusconi 31 Jan 4 18:05 ../ -rw-r--r-- 1 brigida.rusconi 563K Jan 4 11:34 concatb26-1.fasta -rw-r--r-- 1 brigida.rusconi 509K Jan 4 11:34 concatb26-2.fasta Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:36 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169036821.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169037615.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169039460.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169042052.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169043103.
You’re right, thanks for finding that. I’m testing the changes now and will push them up to github as soon as everything is working correctly. Thanks!
Jason
On Jan 5, 2016, at 9:28 AM, brigidar notifications@github.com wrote:
Hi Jason, I looked into the script and I don’t see the all_sorted.txt file created in the vsearch method (line 181). You only make the all_sorted.txt in the usearch but then in the run_vsearch you call the all_sorted.txt. I think if I read it correctly the input file is missing. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:57 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
Brigida,
I’m sure that the two methods work slightly different. You could also try to run cd-hit to see if that works. Let me know if I can help any further.
Jason
On Jan 5, 2016, at 8:53 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
Hi Jason, The regions predicted span multiple genes and actually predict whole phages (30-60kb). I thought I could consider them like contigs. I thought it predicts the genes for each file and then cluster all of them together. Does it cluster by genome or all predicted proteins? I want to figure out how much the phage related mobilome differs between outbreak strains or other related infections. Since some of the proteins in phages are very similar I thought it would make more sense to do the de novo predicition and then cluster them so that I don’t have a lot of genes that are identical, but do not give me much information. I can also simply run it with the predicted genes that I got from prokka for all of the regions and then extract the variome. Was just curious to understand why it clusters with usearch, but not vsearch. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:45 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Brigida,
So what you are telling LS-BSR to do is to predict coding regions in each FASTA file, cluster them, then align the predicted regions back against each FASTA file in your “genomes” directory to determine the BSR. If you have predicted regions and want to determine their distribution across a set of genomes, you could do something like “-g concatb26-1.fasta -d genome_directory”. I currently don’t have a way to cluster a set of genes provided with the “-g” flag, but it’s something that’s on my list. Please let me know if I can clarify anyting else about how the method is working.
regards, Jason
On Jan 5, 2016, at 8:39 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
I am using two fasta files that have multiple regions predicted by PHAST in each of them. The one I ran with usearch I did directly in command line yesterday just to check and not in a job script. Might that be an issue? We are running the server cluster on SGE. Here is the output: total 1.2M drwxr-xr-x 2 brigida.rusconi 4 Jan 4 18:05 ./ drwxr-xr-x 3 brigida.rusconi 31 Jan 4 18:05 ../ -rw-r--r-- 1 brigida.rusconi 563K Jan 4 11:34 concatb26-1.fasta -rw-r--r-- 1 brigida.rusconi 509K Jan 4 11:34 concatb26-2.fasta Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:36 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169036821.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169037615.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169039460.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169042052.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169043103.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169051082.
ah good. So why do you need to split the files before the clustering? Is that only required for usearch or for any clustering method? If you split by the line might that not split up predicted genes? Best, Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 10:38 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
You’re right, thanks for finding that. I’m testing the changes now and will push them up to github as soon as everything is working correctly. Thanks!
Jason
On Jan 5, 2016, at 9:28 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
Hi Jason, I looked into the script and I don’t see the all_sorted.txt file created in the vsearch method (line 181). You only make the all_sorted.txt in the usearch but then in the run_vsearch you call the all_sorted.txt. I think if I read it correctly the input file is missing. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:57 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Brigida,
I’m sure that the two methods work slightly different. You could also try to run cd-hit to see if that works. Let me know if I can help any further.
Jason
On Jan 5, 2016, at 8:53 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Hi Jason, The regions predicted span multiple genes and actually predict whole phages (30-60kb). I thought I could consider them like contigs. I thought it predicts the genes for each file and then cluster all of them together. Does it cluster by genome or all predicted proteins? I want to figure out how much the phage related mobilome differs between outbreak strains or other related infections. Since some of the proteins in phages are very similar I thought it would make more sense to do the de novo predicition and then cluster them so that I don’t have a lot of genes that are identical, but do not give me much information. I can also simply run it with the predicted genes that I got from prokka for all of the regions and then extract the variome. Was just curious to understand why it clusters with usearch, but not vsearch. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:45 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
Brigida,
So what you are telling LS-BSR to do is to predict coding regions in each FASTA file, cluster them, then align the predicted regions back against each FASTA file in your “genomes” directory to determine the BSR. If you have predicted regions and want to determine their distribution across a set of genomes, you could do something like “-g concatb26-1.fasta -d genome_directory”. I currently don’t have a way to cluster a set of genes provided with the “-g” flag, but it’s something that’s on my list. Please let me know if I can clarify anyting else about how the method is working.
regards, Jason
On Jan 5, 2016, at 8:39 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
I am using two fasta files that have multiple regions predicted by PHAST in each of them. The one I ran with usearch I did directly in command line yesterday just to check and not in a job script. Might that be an issue? We are running the server cluster on SGE. Here is the output: total 1.2M drwxr-xr-x 2 brigida.rusconi 4 Jan 4 18:05 ./ drwxr-xr-x 3 brigida.rusconi 31 Jan 4 18:05 ../ -rw-r--r-- 1 brigida.rusconi 563K Jan 4 11:34 concatb26-1.fasta -rw-r--r-- 1 brigida.rusconi 509K Jan 4 11:34 concatb26-2.fasta Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:36 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169036821.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169037615.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169039460.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169042052.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169043103.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169051082.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169054424.
It’s to get around the memory limitations in the free version of USEARCH. Definitely a hack, but I didn’t know how else to do it. But you’re right, I used to have a function that would take out the line wraps and gurantee that you would never interrupt a gene, but I took out that function and now it could cause problems. Thanks also for that, I will look into a new workaround.
Jason
On Jan 5, 2016, at 9:41 AM, brigidar notifications@github.com wrote:
ah good. So why do you need to split the files before the clustering? Is that only required for usearch or for any clustering method? If you split by the line might that not split up predicted genes? Best, Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 10:38 AM, Jason Sahl notifications@github.com<mailto:notifications@github.com> wrote:
You’re right, thanks for finding that. I’m testing the changes now and will push them up to github as soon as everything is working correctly. Thanks!
Jason
On Jan 5, 2016, at 9:28 AM, brigidar notifications@github.com<mailto:notifications@github.com> wrote:
Hi Jason, I looked into the script and I don’t see the all_sorted.txt file created in the vsearch method (line 181). You only make the all_sorted.txt in the usearch but then in the run_vsearch you call the all_sorted.txt. I think if I read it correctly the input file is missing. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:57 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Brigida,
I’m sure that the two methods work slightly different. You could also try to run cd-hit to see if that works. Let me know if I can help any further.
Jason
On Jan 5, 2016, at 8:53 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Hi Jason, The regions predicted span multiple genes and actually predict whole phages (30-60kb). I thought I could consider them like contigs. I thought it predicts the genes for each file and then cluster all of them together. Does it cluster by genome or all predicted proteins? I want to figure out how much the phage related mobilome differs between outbreak strains or other related infections. Since some of the proteins in phages are very similar I thought it would make more sense to do the de novo predicition and then cluster them so that I don’t have a lot of genes that are identical, but do not give me much information. I can also simply run it with the predicted genes that I got from prokka for all of the regions and then extract the variome. Was just curious to understand why it clusters with usearch, but not vsearch. Brigida
Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:45 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
Brigida,
So what you are telling LS-BSR to do is to predict coding regions in each FASTA file, cluster them, then align the predicted regions back against each FASTA file in your “genomes” directory to determine the BSR. If you have predicted regions and want to determine their distribution across a set of genomes, you could do something like “-g concatb26-1.fasta -d genome_directory”. I currently don’t have a way to cluster a set of genes provided with the “-g” flag, but it’s something that’s on my list. Please let me know if I can clarify anyting else about how the method is working.
regards, Jason
On Jan 5, 2016, at 8:39 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
I am using two fasta files that have multiple regions predicted by PHAST in each of them. The one I ran with usearch I did directly in command line yesterday just to check and not in a job script. Might that be an issue? We are running the server cluster on SGE. Here is the output: total 1.2M drwxr-xr-x 2 brigida.rusconi 4 Jan 4 18:05 ./ drwxr-xr-x 3 brigida.rusconi 31 Jan 4 18:05 ../ -rw-r--r-- 1 brigida.rusconi 563K Jan 4 11:34 concatb26-1.fasta -rw-r--r-- 1 brigida.rusconi 509K Jan 4 11:34 concatb26-2.fasta Brigida Rusconi, PhD | Postdoctoral Fellow | Department of Biology | South Texas Center for Emerging Infectious Diseases | University of Texas at San Antonio | One UTSA Circle | TX 78249 | 210-458-7846 | BSE 3.404 | brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edumailto:brigida.rusconi@utsa.edu
On Jan 5, 2016, at 9:36 AM, Jason Sahl notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
Thanks,
Are you using genbank files as input or FASTA?
Could you do a:
ls -la ~/PHAST/PROKKA/B26_12292015/genomes/
thanks, Jason
On Jan 5, 2016, at 8:34 AM, brigidar notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:
~/PHAST/PROKKA/B26_12292015/genomes/
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169036821.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169037615.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169039460.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169042052.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169043103.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169051082.
— Reply to this email directly or view it on GitHubhttps://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169054424.
— Reply to this email directly or view it on GitHub https://github.com/jasonsahl/LS-BSR/issues/8#issuecomment-169055907.
These problems should now be fixed. Please let me know if you see anything else that doesn't look correct.
Hi, I tried to run a comparison of two PHAST prediction regions with prodigal and vsearch and there is nothing in the consensus file, but if I use usearch there is. vsearch is in the path and runs, but the output is empty. Are the settings different for vsearch vs usearch for the cutoff? /home/brigida.rusconi/vsearch/bin/vsearch LOG: 2016/01/04 17:57:37 - clustering with VSEARCH at an ID of 0.9, using 2 processors LOG: 2016/01/04 17:57:37 - VSEARCH clustering finished Best, Brigida