Closed Chahrazadt87 closed 9 months ago
Thank you for your report, user @Chahrazadt87 !
Could you please a larger portion of the log? We would like to see something that ends with first permanentFail
and starts with last line that contains INFO...... <some path>$ <application_name> \
Thanks
And if you can post the whole log file, that would be even better!
Hi Azat,
Thank you for your response. I have attached the partial log file (too large to attach all of it). Please be aware that I only get the missing output files once I add --debug to the command. Many thanks, Chahrazad cwltool.log
Thanks. Looks like you were trying to upload the whole log, but it seems that initial portion of the file is still missing.
The file is too large to upload I'm afraid. My file is 30MB and the limit is 25MB
Could you please post the command line?
Sure: ./pgap.py -r -o Documents/Halorubrum_SS5_8/SS5_8_results Documents/Halorubrum_SS5_8/SS5_8.yaml
I see that you are using "old school" method of supplying user information via YAML. Could you please post that YAML file as well?
Thanks
Please be aware that the following works for all other strains. Only a handful fail.
topology: 'circular' location: 'chromosome' organism: genus_species: 'Halorubrum sp. SS5-8' strain: 'my_strain' contact_info: last_name: 'Warnecke' first_name: 'Tobias' email: 't.w@lms.mrc.ac.uk' organization: 'MRC London Institute of Medical Sciences' department: 'Molecular Systems Group' street: 'Du Cane Rd' city: 'London' postal_code: 'W12 0NN' state: 'Greater London' country: 'United Kingdom' authors:
Thanks, Chahrazad!
I got the genome species:
$ gettax -dates 'Halorubrum sp. SS5-8'
scientific name: Halorubrum sp. SS5-8
tax id: 1089755
parent tax id: 2642239
gb_div: Bacteria
rank: species
lineage: Archaea; Euryarchaeota; Stenosarchaea group; Halobacteria;
Haloferacales; Haloferacaceae; Halorubrum
id_gc: 11
name_gc: Bacterial, Archaeal and Plant Plastid
id_mgc: 0
name_mgc: Unspecified
crt_date: 2011/09/26 14:32:50
upd_date: 2011/10/23 17:33:10
pub_date: 2011/10/22 18:00:26
So we can eliminate the taxonomic novelty factor here in checkm failure. Another suspicious factor is that it is Archaeal. But that also did not work out as a culprit: checkm data, no matter how old it is (2015) does have plenty of that taxonomic lineage in the database.
Upon closer examination of cwltool.log
file you posted I stumbled upon the error message that I missed previously:
Process SyncManager-1:
Traceback (most recent call last):
File "/opt/python-3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/python-3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/python-3.9/lib/python3.9/multiprocessing/managers.py", line 583, in _run_server
server = cls._Server(registry, address, authkey, serializer)
File "/opt/python-3.9/lib/python3.9/multiprocessing/managers.py", line 156, in __init__
self.listener = Listener(address=address, backlog=16)
File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 448, in __init__
self._listener = SocketListener(address, family, backlog)
File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 591, in __init__
self._socket.bind(address)
PermissionError: [Errno 1] Operation not permitted
Traceback (most recent call last):
File "/root/venv/bin/checkm", line 856, in <module>
checkmParser.parseOptions(args)
File "/root/venv/lib/python3.9/site-packages/checkm/main.py", line 992, in parseOptions
self.analyze(options)
File "/root/venv/lib/python3.9/site-packages/checkm/main.py", line 326, in analyze
binIdToModels = mgf.find(binFiles,
File "/root/venv/lib/python3.9/site-packages/checkm/markerGeneFinder.py", line 68, in find
binIdToModels = mp.Manager().dict()
File "/opt/python-3.9/lib/python3.9/multiprocessing/context.py", line 57, in Manager
m.start()
File "/opt/python-3.9/lib/python3.9/multiprocessing/managers.py", line 558, in start
self._address = reader.recv()
File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/opt/python-3.9/lib/python3.9/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
I think we have enough information now to try reproduce this at our side. I suppose you are using the latest version of PGAP package, correct?
Hi Azat,
Yes I am. Everything is up to date software wise.
Many thanks,
Chahrazad
We will try to look at this ASAP.
Good morning Azat,
Any news on this issue please?
Many thanks,
Chahrazad
We started working on this, Chahrazad.
Could you please tarball and post the contents of the directory */tmp-outdir/usg1w98j? Thanks!
I was not able to reproduce your results with the same species, Chahrazad. For input data I put a similar species from the same genome and I have not got anything
Is your input by any chance a single plasmid?
I ran a plasmid I had as well, standalone, and I was not able to reproduce the results.
Looking more into the cwltool.log
file you posted I can see that this is a full blown assembly.
Could you please post head -50 cwltool.log
output?
Hi Azat,
It is a whole genome assembly with a few contigs. Could you please explain what you mean by "head -50 cwltool.log". I am new to this, so bear with me :)
head
is a unix command that produces only specified number of first lines of the text file.
Chahrazad, would you be willing to post the input genome FASTA file?
Sure, can you please give me your email so that I can send it to you?
It's our official email prokaryote-tools@ncbi.nlm.nih.gov The data stays there strictly on need to know basis.
I just emailed you the genome :)
Thanks again for looking into this.
Thanks! Running it now...
Nope. Could not reproduce with exactly your input either. So, how about that head -50 cwltool.log
output?
head -50 cwltool.log > head.50.txt
and attach it here?
We started working on this, Chahrazad.
Could you please tarball and post the contents of the directory */tmp-outdir/usg1w98j? Thanks!
Hi Azat,
Apologies for my late response. I have attached all that you need now.
Many thanks,
Chahrazad head.50.txt
Thanks for the files, Chahrazad!
I see that you are running this on Mac. Just pointing this here for the purposes of indexing.
Could you please post output of
find . -name annotation.fa | xargs /bin/ls -ltr
Thanks!
-rw-r--r--@ 1 ct1221 staff 1306409 20 Nov 13:25 ./SS5_8_results/debug/tmpdir/m8jqspdl/checkm.1296702415621568dOBsj4/fasta_by_scaffold/bins-prot/annotation.fa -rw-r--r--@ 1 ct1221 staff 1306409 20 Nov 13:25 ./SS5_8_results/debug/tmpdir/m8jqspdl/checkm.1296702415621568dOBsj4/fasta_by_scaffold/annotation.fa
Thanks, Chahrazad!
So, this is different from PGAP-8585 case
While we are scratching our heads, let me at least pass back to you what we successfully calculated in-house.
I am going to find out what's our SOP on this.
let me at least pass back to you what we successfully calculated in-house
Hi, Chahrazad! Could you please confirm that you got the results?
Another question: can you try to run standalone checkm
on your input file debug/tmpdir/*/checkm.*/fasta_by_scaffold/bins-prot/annotation.fa
:
mkdir -p bins-prot/
cp debug/tmpdir/*/checkm.*/fasta_by_scaffold/bins-prot/annotation.fa bins-prot/
checkm taxonomy_wf -t 1 -g -x fa genus Halorubrum bins-prot/ taxonomy_wf-prot/
Thanks!
Hi Azat,
I got this message when I ran it:
[2023-12-13 13:26:58] INFO: CheckM data: /Users/ct1221/.checkm
[2023-12-13 13:26:58] INFO: [CheckM - taxon_set] Generate taxonomic-specific marker set.
Unexpected error: <class 'FileNotFoundError'>
Traceback (most recent call last):
File "/Users/ct1221/opt/anaconda3/bin/checkm", line 856, in
checkmParser.parseOptions(args)
File "/Users/ct1221/opt/anaconda3/lib/python3.9/site-packages/checkm/main.py", line 991, in parseOptions
self.taxonSet(options)
File "/Users/ct1221/opt/anaconda3/lib/python3.9/site-packages/checkm/main.py", line 293, in taxonSet
bValidSet = taxonParser.markerSet(
File "/Users/ct1221/opt/anaconda3/lib/python3.9/site-packages/checkm/taxonParser.py", line 82, in markerSet
taxonMarkerSets = self.readMarkerSets()
File "/Users/ct1221/opt/anaconda3/lib/python3.9/site-packages/checkm/taxonParser.py", line 40, in readMarkerSets
for line in open(DefaultValues.TAXON_MARKER_SETS):
FileNotFoundError: [Errno 2] No such file or directory: '/Users/ct1221/.checkm/taxon_marker_sets.tsv'
Please note, that sometimes the run works when I shut down everything or clear the cache. It’s not very consistent though so I’m still confused as to the reason this happens.
Kind regards,
Chahrazad
From: Azat Badretdin @.> Date: Wednesday, 13 December 2023 at 10:56 To: ncbi/pgap @.> Cc: Taissir, Chahrazad @.>, Mention @.> Subject: Re: [ncbi/pgap] [BUG] PGAP analysis generates all files except .aa and .gbk (Issue #276) This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
let me at least pass back to you what we successfully calculated in-house
Hi, Chahrazad! Could you please confirm that you got the results?
Another question: can you try to run standalone checkm on your input file debug/tmpdir//checkm./fasta_by_scaffold/bins-prot/annotation.fa:
mkdir -p bins-prot/
cp debug/tmpdir//checkm./fasta_by_scaffold/bins-prot/annotation.fa bins-prot/
checkm taxonomy_wf -t 1 -g -x fa genus Halorubrum bins-prot/ taxonomy_wf-prot/
Thanks!
— Reply to this email directly, view it on GitHubhttps://github.com/ncbi/pgap/issues/276#issuecomment-1853694549, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWNU5UA376WTDJW5XGIEAQ3YJGCWNAVCNFSM6AAAAAA7TBD5N2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTGY4TINJUHE. You are receiving this because you were mentioned.Message ID: @.***>
Thanks, how do you run it? It looks to me that you run it directly on your Mac, not from under virtual machine/container. I would recommend to fix your local installation and run again. Right now the error you are getting is not related to our problem - it is a problem of your installation. The installation of checkm that pgap.py is using is inside docker container and the data is elsewhere as well
Hello,
I have been having an issue with a couple of genomes where the results do not contain all of the output files especially, the annot.aa and the .gbk files. I have looked at the log files and the only error I can see is below. Does anyone have an idea of how to fix this please?
Error: error processing job: (CFileException::eFileIO) Error opening checkm dombtblout: /pgap/output/debug/tmpdir/m8jqspdl/checkm.1296702415621568dOBsj4/fasta_by_scaffold/checkm.out terminate called after throwing an instance of 'ncbi::CException' what(): NCBI C++ Exception: Error: LIB(CException::eUnknown) "/export/home/gpipe/TeamCity/Agent3/work/427aceaa834ecbb6/ncbi_cxx/src/internal/gpipe/gpexec/queue/lib/wn_app.cpp", line 411: ncbi::CGPX_WorkerApp::Run() --- 1 jobs failed Stack trace: /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/lib/libgpxlib.so /export/home/gpipe/TeamCity/Agent3/work/427aceaa834ecbb6/ncbi_cxx/src/internal/gpipe/gpexec/queue/lib/wn_app.cpp:409 ncbi::CGPX_WorkerApp::Run() offset=0x0 addr=0x7f098f4f229f /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/bin/checkm_wnode :0 offset=0x0 addr=0x41a430 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/lib/libxncbi.so /export/home/gpipe/TeamCity/Agent3/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:711 ncbi::CNcbiApplicationAPI::x_TryMain(ncbi::EAppDiagStream, char const, int, bool) offset=0x0 addr=0x7f097674a132 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/lib/libxncbi.so /export/home/gpipe/TeamCity/Agent3/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:1023 ncbi::CNcbiApplicationAPI::AppMain(int, char const const, char const const, ncbi::EAppDiagStream, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) offset=0x0 addr=0x7f097674d78c
/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/bin/checkm_wnode :0 offset=0x0 addr=0x40bc82
/usr/lib64/libc-2.17.so :0 offset=0x0 addr=0x7f0975176554
/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2023-10-03.build7061/arch/x86_64/bin/checkm_wnode :0 offset=0x0 addr=0x40be59