Closed hongbingp closed 11 months ago
Hi @hongbingp,
That's intriguing. I've never encountered such a bug before. It seems that the CESAR output is deviating from the expected format.
Here are a few steps that could help me identify the problem:
MSLDIQSLDIQCEELSDARWAELLPLLQQCQVVR-LDDCGLTEARCKDISSALR-VNPALAELN-LRSNELGDVGVHCVLQGLQTPSCKIQKLSLQNCCLTGAGCGVLSSTLRTLPTLQELHLSDNLLGDAGLQLLCEGLLDPQCRLEKLQ-LEYCSLSAASCEPLASVLRAKPDFKELT-VSNNDINEAGVRVLCQGLKDSP-CQLEALKLESCGVTSDNCRDLCGIVASKASLRELALGSNKLGDVGMAELCPGLLHPSSRLRTLW--IWECGITAKGCGDLCRVLRAKESLKELSLAGNELGDEGARLLCET-LLEPGCQLESLWVKSCSFTAACCSHFSSVLAQNRFLLELQ-ISNNRLEDAGVREL-CQGLGQPGSVLRVLW---LADCDVSDSSCSSLAATLLANHSLRELDLSNNCLGDAGILQLVESVRQPGCLLEQLVLYDIYWSEEMEDRLQALEKDKPSLRVISX
If you can locate this, please select the corresponding transcript. Then create a reference bed file that only includes this transcript and run TOGA with exactly the same parameters, but use the trimmed reference annotation file. Please then send me the directory (you can exclude the chain file from it).
You can find the CESAR output files in the $toga_output_dir/temp/cesar_results
Looking forward to your response!
Hi @kirilenkobm ,
Thank you for your help!
This link contains both input and output files of TOGA, as well as the shell script I used for running TOGA. https://drive.google.com/drive/folders/1_pkHRYcDD11nOm46g-Utvi36X24Z6pgP?usp=drive_link
Let me know if you need more information.
Hongbing
Hi @hongbingp
that's interesting, I tried to reproduce the issue but my TOGA run finished successfully:
bkirilenko@delta:/projects/hillerlab/genome/src/TOGA_dev/human-betta-reproduce (master)$ ls
codon.fasta inact_mut_data.txt orthology_classification.tsv prot.fasta query_gene_spans.bed t2bit.link
done.status loss_summ_data.tsv proc_pseudogenes.bed q2bit.link query_isoforms.tsv temp
genes_rejection_reason.tsv nucleotide.fasta project_args.json query_annotation.bed ref_orphan_transcripts.txt version.txt
bkirilenko@delta:/projects/hillerlab/genome/src/TOGA_dev/human-betta-reproduce (master)$ cat query_annotation.bed | wc -l
143635
bkirilenko@delta:/projects/hillerlab/genome/src/TOGA_dev/human-betta-reproduce (master)$ cat project_args.json
{"chain_input": "input_repo_issue/hg38.betta.allfilled.chain.gz", "bed_input": "input_repo_issue/hg38.knownGene.bed", "tDB": "/projects/hillerlab/genome/gbdb-HL/hg38/hg38.2bit", "qDB": "input_repo_issue/betta_softmasked.2bit", "project_dir": "/projects/hillerlab/genome/src/TOGA_dev/human-betta-reproduce", "project_name": null, "min_score": 15000, "isoforms": "", "keep_temp": true, "limit_to_ref_chrom": null, "nextflow_dir": null, "nextflow_config_dir": "/projects/hillerlab/genome/src/TOGA_dev/nextflow_config_files/", "do_not_del_nf_logs": false, "cesar_bigmem_config": null, "para": false, "para_bigmem": false, "chain_jobs_num": 100, "no_chain_filter": false, "orth_score_threshold": 0.5, "cesar_jobs_num": 500, "cesar_binary": null, "using_optimized_cesar": false, "output_opt_cesar_regions": false, "mask_stops": true, "cesar_buckets": "10,100", "cesar_exec_seq": false, "cesar_chain_limit": 100, "cesar_mem_limit": 16, "time_marks": null, "u12": null, "stop_at_chain_class": false, "uhq_flank": 50, "o2o_only": false, "no_fpi": false, "disable_fragments_joining": false, "ld_model": false, "annotate_paralogs": false, "mask_all_first_10p": false}bkirilenko@delta:/projects/hillerlab/genome/src/TOGA_dev/human-betta-reproduce (master)$
I will send you the output later. Could you check whether there is some inconsistency in your system? Pls also note - I used the latest TOGA version (1.1.3)
Thank you so much!
I used the TOGA 1.1.3 and ran the provided test ./toga.py test_input/hg38.mm10.chr11.chain test_input/hg38.genCode27.chr11.bed ${path_to_human_2bit} ${path_to_mouse_2bit} --kt --pn test -i supply/hg38.wgEncodeGencodeCompV34.isoforms.txt --nc ${path_to_nextflow_config_dir} --cb 3,5 --cjn 500 --u12 supply/hg38.U12sites.tsv --ms
and it seemed that a similar problem also happened in STEP 7, which reported that Process 'execute_jobs (22)' terminated for an unknown reason -- Likely it has been terminated by the external system. But somehow this test finished and got the output.
Here is the log file of the test. slurm-7436855.txt
I then swithed the nextflow executor of CESAR to 'local' and ran TOGA for human and betta fish. Now STEP 7 can begin to run instead of reporting failure at very beginning. So I wonder if there are some additional parameters I need to set for CESAR so I can run it on slurm?
This is the longest and most unstable part of the TOGA pipeline. The jobs in this stage are quite heavy and sometimes take longer than expected. Additionally, certain clusters may not handle them well. To compensate for this, TOGA attempts to rerun each CESAR job multiple times. Therefore, it is normal if some CESAR jobs crash, but there should be no issues on the engineering side.
When TOGA runs locally, it utilizes all available CPU cores on the local machine (can be PC, laptop, only the master node of cluster \ also suitable for configurations with numerous CPUs). This setup can work fine for small genomes or small sections of reference annotations. However, in general, it is strongly recommended to use a cluster for better performance.
The error message 'execute_jobs (22)' terminated for an unknown reason -- likely it has been terminated by the external system' does not provide any useful information, to be honest.
To assist further, could you run TOGA with the flag '--do_not_del_nf_logs,' then locate the 'nextflow_logs' directory, compress it, and send it to me?
I'm also planning to release another update for TOGA today, which may improve its stability.
Thank you for the information!
I ran the test twice (named test2 and test3) using exactly same script. In STEP 7, test2 failed twice and retried successfully while test 3 failed four times and reported errors .
Slrum log files test2_log.txt test3_log.txt
Nextflow log files nextflow_logs.tar.gz
In addition, I ran the TOGA for mouse and Peromyscus maniculatus, it failed all CESAR jobs but somehow proceeded to the final step and got the results. Is it normal? Can I use the output for analysis? Here is the log file slurm-7490328.txt
Thanks
Hi @kirilenkobm
I’ve spent two weeks on troubleshooting but it’s just impossible to get TOGA to run in our cluster. I wonder if you could help me run TOGA for mouse and Peromyscus maniculatus. If so, I can provide the input data. Thank you for all your help!
Sure, pls email me with a link to the data (we need the genome fasta). If you have a repeatModeler lib for that assembly, we can use that too. Is this assembly on NCBI? e.g. https://www.ncbi.nlm.nih.gov/assembly/GCA_026229955.1 ?
Hi @hongbingp
Looks like nextflow does not fit all the users. In version 1.1.5
(or 1.1.6
), I plan to release another (much better) way to handle parallel jobs. It will be a module structure (following the "strategy" OOP pattern), where I provide users with a class to implement their own way of handling parallel jobs + necessary documentation + examples of how it is implemented for nextflow and para.
This module is to be included in the toga pipeline (for now, it is here, but not attached): https://github.com/hillerlab/TOGA/blob/master/parallel_jobs_manager.py Strategy for para is already implemented, for nextflow (which will be a default) - pretty much. Custom strategy is a template.
Hi!
I'm running TOGA for human and betta fish genome but it failed at STEP 7: Execute CESAR jobs. It can only excute one nextflow job and then kept reporting like this
NOTE: Process 'execute_jobs (288)' terminated for an unknown reason -- Likely it has been terminated by the external system -- Execution is retried (3)
. And the error messages indicated that "Cesar output is corrupted"These are log files: slurm-7283855.err.txt slurm-7283855.txt
My code:
./toga.py /burg/sscc/users/hp2608/data/chain/human_betta2/hg38.betta.allfilled.chain.gz /burg/sscc/users/hp2608/data/hg38/ucsc/hg38.knownGene.bed /burg/sscc/users/hp2608/data/hg38/ucsc/hg38.2bit /burg/sscc/users/hp2608/data/betta_soft/betta_softmasked.2bit --kt --project_dir /burg/sscc/users/hp2608/data/TOGA_results/human-betta --nc nextflow_config_files --nd /burg/sscc/users/hp2608/tmp/nextflow_temp --cb 3,10 --cjn 500 --ms
Could you help with this? Thanks!
Hongbing