Open olekto opened 1 year ago
Hi @olekto
Sorry for late response. During running compleasm, it will download the lineage file which is specified. So if the working nodes do not have internet access, you can use compleasm download
to pre-download the lineage on the login nodes then specify the directory of the downloaded file when running on the working nodes.
Neng
I pointed compleasm to where I had downloaded all the lineages for BUSCO. Does it require something different than BUSCO, because it still downloaded the lineage. Can I use the compleasm downloaded lineages for BUSCO? I'd rather not have two sets of lineages laying around on the cluster.
What I did for BUSCO was to download everything, and then point to it. I guess I have to download each lineage independently in the case of compleasm?
When I got the lineage downloaded via compleasm, it ran successfully. So looking good so far. :)
@olekto The organization of the lineage file downloaded by compleasm is different from that of BUSCO. So directly specifying the lineage directory downloaded by BUSCO doesn't work. We will consider making compleasm compatible with the lineage files downloaded by BUSCO in future versions.
is there a way to download lineage files manually instead of using compleasm download?
is there a way to download lineage files manually instead of using compleasm download?
I think you can directly download the data from here: https://busco-archive.ezlab.org/data/lineages/ and can use it after unzip.
@olekto The organization of the lineage file downloaded by compleasm is different from that of BUSCO. So directly specifying the lineage directory downloaded by BUSCO doesn't work. We will consider making compleasm compatible with the lineage files downloaded by BUSCO in future versions.
I am waiting on this feature, too. Enabling the program to run with the lineage files downloaded by BUSCO will be really useful for my case.
Hi, are there any updates on this issue? The protein
command works well in offline mode, but the run
command is not working properly in offline mode
@olekto The organization of the lineage file downloaded by compleasm is different from that of BUSCO. So directly specifying the lineage directory downloaded by BUSCO doesn't work. We will consider making compleasm compatible with the lineage files downloaded by BUSCO in future versions.
Hi @lilinzhou,
The latest version of compleasm v0.2.5 has fix the problem. The bug is caused by the update of BUSCO related file format few month ago.
Hi @huangnengCSU
The latest version seems also request for a download by using run
command. I found a newly empty file "file_versions.tsv.tmp" in the BUSCO database folder.
The command I use: python3 /path/to/software/compleasm_kit/compleasm.py run -a genome.fasta -l eukaryota_odb10 -o out_genome -L /path/to/software/BUSCO/lineages
see the error log at the last.
But everything goes well by using protein
command.
The command I use: python3 /path/to/software/compleasm_kit/compleasm.py protein -a protein.fasta -l eukaryota_odb10 -o out_protein -L /path/to/software/BUSCO/lineages
Our nodes do not have internet access, I can only download the BUSCO database manually. Could you help to solve this problem.
Searching for miniprot in the path where compleasm.py is located
Searching for hmmsearch in the path where compleasm.py is located
miniprot execute command:
/path/to/software/compleasm_kit/miniprot
Traceback (most recent call last):
File "/path/to/software/Python-3.9.6/lib/python3.9/urllib/request.py", line 1346, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/path/to/software/Python-3.9.6/lib/python3.9/http/client.py", line 1257, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/path/to/software/Python-3.9.6/lib/python3.9/http/client.py", line 1303, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/path/to/software/Python-3.9.6/lib/python3.9/http/client.py", line 1252, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/path/to/software/Python-3.9.6/lib/python3.9/http/client.py", line 1012, in _send_output
self.send(msg)
File "/path/to/software/Python-3.9.6/lib/python3.9/http/client.py", line 952, in send
self.connect()
File "/path/to/software/Python-3.9.6/lib/python3.9/http/client.py", line 1426, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/path/to/software/Python-3.9.6/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/path/to/software/Python-3.9.6/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/path/to/software/Python-3.9.6/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/path/to/software/compleasm_kit/compleasm.py", line 2741, in <module>
main()
File "/path/to/software/compleasm_kit/compleasm.py", line 2737, in main
args.func(args)
File "/path/to/software/compleasm_kit/compleasm.py", line 2601, in run
mr = CompleasmRunner(assembly_path=assembly_path,
File "/path/to/software/compleasm_kit/compleasm.py", line 2114, in __init__
self.downloader = Downloader(library_path)
File "/path/to/software/compleasm_kit/compleasm.py", line 85, in __init__
self.lineage_description, self.placement_description = self.download_file_version_document()
File "/path/to/software/compleasm_kit/compleasm.py", line 127, in download_file_version_document
urllib.request.urlretrieve(hash_url, hash_download_path)
File "/path/to/software/Python-3.9.6/lib/python3.9/urllib/request.py", line 239, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/path/to/software/Python-3.9.6/lib/python3.9/urllib/request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "/path/to/software/Python-3.9.6/lib/python3.9/urllib/request.py", line 517, in open
response = self._open(req, data)
File "/path/to/software/Python-3.9.6/lib/python3.9/urllib/request.py", line 534, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/path/to/software/Python-3.9.6/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/path/to/software/Python-3.9.6/lib/python3.9/urllib/request.py", line 1389, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/path/to/software/Python-3.9.6/lib/python3.9/urllib/request.py", line 1349, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error EOF occurred in violation of protocol (_ssl.c:1129)>
To @lilinzhou,
I guess your problem should be caused by the network. During first run, compleasm will download some files. But since the network problem, the download did not finished and there will be some tag files having the name ending with .tmp
in the download folder.
If the problem is from network, you may have to download the lineage files using compleasm download
on a computer having access to database of lineage files. Then you can upload the download folder on the computer to work server. When performing compleasm run
you can specify the download folder with option -L
.
Why this problem only occurs in compleasm run
and not in compleasm protein
is because there is a process to check the lineage files in compleasm run
. However, compleasm protein
does not have this process (compleasm protein should have this process but I have not implemented it).
Hi, I am trying to get compleasm running, but run into an issue. Specifically, I get this error:
The command is this:
compleasm run -a default_filt.hic.hap1.p_ctg.fa -l aves -L /cluster/projects/nn8013k/opt/busco_dbs/lineages/ -t 10 -o compleasm_test
When running on the login nodes, it looks like it actually downloads aves and eukaryota, even though the folders existed in /cluster/projects/nn8013k/opt/busco_dbs/lineages/.
Am I doing something wrong?
Our computing nodes do not have internet access, and I don't think it is nice practice to download something without letting the user know that it is happening. How can I turn this off? That is, can I download something before submitting the job to the cluster?
Thank you.
Ole