trouble with dafga_refDB.py

DoctorDodge commented 8 years ago

Hi,

I would really like to try out your analysis method but am having a problem with running the test data. When running the dafga_refDB.py script I get the following error:

dafga_refDB.py -gp pmoA_refseqs.gp -o refDB --email myname@gmail.com

...Parsing reference sequnece information in gp format...

Done

Traceback (most recent call last): File "/usr/local/bin/dafga_refDB.py", line 130, in efetch_from_taxonomy(ref_xref) File "/usr/local/bin/dafga_refDB.py", line 67, in efetch_from_taxonomy handle_taxon = Entrez.efetch(db="taxonomy", id=xref.values(), retmode="xml") File "/usr/lib/python2.7/dist-packages/Bio/Entrez/init.py", line 149, in efetch return _open(cgi, variables, post) File "/usr/lib/python2.7/dist-packages/Bio/Entrez/init.py", line 459, in _open handle = _urlopen(cgi, data=_as_bytes(options)) File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 404, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 422, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open raise URLError(err) urllib2.URLError: <urlopen error [Errno -5] No address associated with hostname> in do_open raise URLError(err) urllib2.URLError: <urlopen error [Errno -5] No address associated with hostname>

It would be great if you are able to help me. Thank you, Neil W

outbig commented 8 years ago

Hi,

Thanks for your interest in DAFGA. This error is because NCBI block your access to their database. Try it again when the NCBI server is not busy, for example, in the midnight. Anyway, I will address this issue as soon as possible and let you know when it is ready. If you need any help for DAFGA analysis, please don't hesitate to contact me.

Best, Yongkyu

On 07/04/16 06:08, DoctorDodge wrote:

Hi,

I would really like to try out your analysis method but am having a problem with running the test data. When running the dafga_refDB.py script I get the following error:

dafga_refDB.py -gp pmoA_refseqs.gp -o refDB --email myname@gmail.com mailto:myname@gmail.com

...Parsing reference sequnece information in gp format...

Done

Traceback (most recent call last): File "/usr/local/bin/dafga_refDB.py", line 130, in efetch_from_taxonomy(ref_xref) File "/usr/local/bin/dafga_refDB.py", line 67, in efetch_from_taxonomy handle_taxon = Entrez.efetch(db="taxonomy", id=xref.values(), retmode="xml") File "/usr/lib/python2.7/dist-packages/Bio/Entrez/init.py", line 149, in efetch return /open(cgi, variables, post) File "/usr/lib/python2.7/dist-packages/Bio/Entrez/_init/.py", line 459, in _open handle = _urlopen(cgi, data=_as_bytes(options)) File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 404, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 422, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open raise URLError(err) urllib2.URLError: in do_open raise URLError(err) urllib2.URLError:

It would be great if you are able to help me. Thank you, Neil W

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/outbig/DAFGA/issues/2

Drdodge commented 8 years ago

Thanks Yongkyu,

My apologies for messaging you prematurely, I realised this waqs the problem shortly after messaging you. Unfortunately I have another problem which I have been unable to resolve and I hope you can help. I am still running the test data and when running the dafga_otus.py script I get the following error during the pick representative seqs of OTUs part:

...Pick representative seq of OTUs...

Traceback (most recent call last): File "/usr/local/bin/dafga_otus.py", line 159, in rep_aa_seqs(rep_otu,faa) File "/usr/local/bin/dafga_otus.py", line 62, in rep_aa_seqs seq_id = re.search("centroid=(.+?);seq",record.description).group(1) AttributeError: 'NoneType' object has no attribute 'group'

I am guessing this is being caused by a missing dependency, but as far as I can tell, I have installed all of these. Any advice would be greatly appreciated.

Cheers Neil

outbig commented 8 years ago

Hi Neil,

Which version of usearch is installed?

BR, Yongkyu

On 11/04/16 00:21, Drdodge wrote:

Thanks Yongkyu,

My apologies for messaging you prematurely, I realised this waqs the problem shortly after messaging you. Unfortunately I have another problem which I have been unable to resolve and I hope you can help. I am still running the test data and when running the dafga_otus.py script I get the following error during the pick representative seqs of OTUs part:

...Pick representative seq of OTUs...

Traceback (most recent call last): File "/usr/local/bin/dafga_otus.py", line 159, in rep_aa_seqs(rep_otu,faa) File "/usr/local/bin/dafga_otus.py", line 62, in rep_aa_seqs seq_id = re.search("centroid=(.+?);seq",record.description).group(1) AttributeError: 'NoneType' object has no attribute 'group'

I am guessing this is being caused by a missing dependency, but as far as I can tell, I have installed all of these. Any advice would be greatly appreciated.

Cheers Neil

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/outbig/DAFGA/issues/2#issuecomment-208083621

DoctorDodge commented 8 years ago

Hi Yongkyu,

I am running v8.1.1861_i86linux32. I read the previous post about issues with the usearch version and was sure to upgrade to a version above v7.

I'm not sure if this matters but a warning was raised during the previous command (dafga_correlation.py) after the ...pairwise comparison of SSU rRNA sequences... step: /usr/lib/pymodules/python2.7/matplotlib/collections.py:548: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison if self._edgecolors == 'face':

Just in case it helps, I have pasted the entire output from the dafga_otus command below:

...Pre-Clustering of amplicon sequences(nt) with 0.97 identity...

Licensed to: neilo.wilson@gmail.com

00:00 47Mb 100.0% Reading pmoA_amplicons.fna 00:00 13Mb Pass 1...13534 seqs, 12997 uniques, 12744 singletons (98.1%) 00:00 43Mb Min size 1, median 1, max 13, avg 1.04 00:00 46Mb done. 00:02 62Mb 100.0% 2705 clusters, max size 1360, avg 5.0 00:02 62Mb 0.0% Writing centroids to /home/drd/programs/DAFGA-master/test/otus/preclustering/preclstr_0.97_centroids00:02 62Mb 100.0% Writing centroids to /home/drd/programs/DAFGA-master/test/otus/preclustering/preclstr_0.97_centroids.fasta 00:02 62Mb 0.0% Building MSAs 00:02 64Mb 100.0% Building MSAs

  Seqs  12997 (12997 ()

Clusters 2705 Max size 1360 Avg size 5.0 Min size 1 Singletons 2022, 15.6% of seqs, 74.8% of clusters Max mem 64Mb Time 2.00s Throughput 6498.5 seqs/sec.

Done

...Translation into protein sequence... Representative consensus sequences are translated with Standard Code

Done

...Clustering of translated representative sequences... with 0.86 identity using usearch

Licensed to: neilo.wilson@gmail.com

00:00 40Mb 100.0% Reading /home/drd/programs/DAFGA-master/test/otus/clustering/translated_consensus.fasta 00:00 6.8Mb Pass 1...2705 seqs, 2624 uniques, 2590 singletons (98.7%)
00:00 35Mb Min size 1, median 1, max 9, avg 1.03 00:00 38Mb done. 00:00 94Mb 100.0% 1598 clusters, max size 509, avg 1.7 00:00 94Mb 0.1% Writing centroids to /home/drd/programs/DAFGA-master/test/otus/clustering/cluster_0.86_centroids.fas00:00 94Mb 100.0% Writing centroids to /home/drd/programs/DAFGA-master/test/otus/clustering/cluster_0.86_centroids.fasta

  Seqs  2624

Clusters 1598 Max size 509 Avg size 1.7 Min size 1 Singletons 1484, 56.6% of seqs, 92.9% of clusters Max mem 94Mb Time 1.00s Throughput 2624.0 seqs/sec.

Done

...Pick representative seq of OTUs...

Traceback (most recent call last): File "/usr/local/bin/dafga_otus.py", line 159, in rep_aa_seqs(rep_otu,faa) File "/usr/local/bin/dafga_otus.py", line 62, in rep_aa_seqs seq_id = re.search("centroid=(.+?);seq",record.description).group(1) AttributeError: 'NoneType' object has no attribute 'group'

Thanks, Neil

DoctorDodge commented 8 years ago

Hi Yongkyu,

Have you had a chance to look at the output I sent you last week? To me it looks like usearch is working OK as the script can pick OTUs - the error seems to come when picking representative OTUs. It would be great if you could offer any suggestions soon, as I have some pressure on me to deliver results for a manuscript that is otherwise written and ready to submit.

Sorry to hassle you, I'm sure you are as busy as I am!

Thanks Neil

outbig commented 8 years ago

Hi Neil,

I am so sorry, but now I am sick at home. As soon as geting back to work (hopefully tomorrow otherwise after tomorrow), I promise to address this issue first.

Best, Yongkyu

On 19/04/16 17:06, DoctorDodge wrote:

Hi Yongkyu,

Have you had a chance to look at the output I sent you last week? To me it looks like usearch is working OK as the script can pick OTUs - the error seems to come when picking representative OTUs. It would be great if you could offer any suggestions soon, as I have some pressure on me to deliver results for a manuscript that is otherwise written and ready to submit.

Sorry to hassle you, I'm sure you are as busy as I am!

Thanks Neil

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/outbig/DAFGA/issues/2#issuecomment-211970079

outbig commented 8 years ago

Hi Neil,

I had a look at your error message. As you said, it occurred when picking representative seqs. It is because Usearch8.1 does not provide the type of representative sequence, the gene id, and the number of seqs of clusters in the fasta header line. So DAFGA couldn't get the representative seqs. The easiest solution is to use Usearch7.0. Sorry for inconvenience and late response again. If you have any other problem, please contact me.

Best, Yongkyu

On 19/04/16 17:06, DoctorDodge wrote:

Hi Yongkyu,

Have you had a chance to look at the output I sent you last week? To me it looks like usearch is working OK as the script can pick OTUs - the error seems to come when picking representative OTUs. It would be great if you could offer any suggestions soon, as I have some pressure on me to deliver results for a manuscript that is otherwise written and ready to submit.

Sorry to hassle you, I'm sure you are as busy as I am!

Thanks Neil

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/outbig/DAFGA/issues/2#issuecomment-211970079

Drdodge commented 8 years ago

Hi Yongkyu,

Thanks for getting back to me. Installing usearch v7.0 solved the problem. Sorry, Ireally should have figured this out for myself! I am now having the same issue as Katy with the dafga_taxonomy.py script. If you could please send the updated version of this script to me ASAP that would be great. After that I wont need to hassle you anymore :-)

My email is: neil.wilson@sydney.edu.au

Cheers Neil

outbig / DAFGA

trouble with dafga_refDB.py #2