Open vinitamehlawat opened 9 months ago
Hi @vinitamehlawat
Thank you for reaching out. Firstly, I consider a total runtime of around 12 hours to be quite normal. I would start to worry if it takes more than a couple of days
The issue is that CESAR2.0 isn't very memory-efficient when dealing with long genes. As far as I know, improvements are being developed in the lab, although I haven't worked there for a couple of years.
As I can see, CESAR/TOGA failed to process a couple of genes. Pls check, what exactly is in the /home/vlamba/BD_gene-loss/_cesar_rerun_batch_100
file - what are the failed transcripts.
BR, Bogdan
importlib-metadata
dependency.
Thank you for pointing this outhttps://github.com/hillerlab/TOGA/commit/6ebd86f7c4cf10c30f19ef68976a8df05a5abd7c
Hi @kirilenkobm
Thank you very much for your response. Following your suggestion, I did cat _cesar_rerun_batch_100
and got following message for two rejected logs:
/scrfs/storage/vlamba/home/TOGA/cesar_runner.py /home/vlamba/BD-gene-loss/RERUN_CESAR_JOBS/rerun_job_1_100 /home/vlamba/BD-gene-loss/temp/cesar_results/rerun_job_1_100.txt --check_loss /home/vlamba/BD-gene-loss/temp/inact_mut_data/rerun_job_1_100.txt --rejected_log /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_1_100.txt /scrfs/storage/vlamba/home/TOGA/cesar_runner.py /home/vlamba/BD-gene-loss/RERUN_CESAR_JOBS/rerun_job_2_100 /home/vlamba/BD-gene-loss/temp/cesar_results/rerun_job_2_100.txt --check_loss /home/vlamba/BD-gene-loss/temp/inact_mut_data/rerun_job_2_100.txt --rejected_log /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_2_100.txt
and the looked at rejected_log
in /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_1_100.txt
and it gave me following massage:
/scrfs/storage/vlamba/home/TOGA/CESAR_wrapper.py ENSCGOT00000028426.1 28 /home/vlamba/BD-gene-loss/temp/toga_filt_ref_annot.hdf5 /home/vlamba/BD-gene-loss/temp/genome_alignment.bst /home/vlamba/LepNud-Chain-alignment/CG.2bit /home/vlamba/BD.2bit --cesar_binary /scrfs/storage/vlamba/home/TOGA/CESAR2.0/cesar --uhq_flank 50 --temp_dir /home/vlamba/BD-gene-loss/temp/cesar_temp_files --mask_stops --check_loss --alt_frame_del --memlim 10 CESAR JOB FAILURE Input is corrupted! Reference sequence should start with ATG! Error! CESAR output is corrupted, target must start with ATG! Error! CESAR output is corrupted, target must start with ATG! Traceback (most recent call last): File "/scrfs/storage/vlamba/home/TOGA/CESAR_wrapper.py", line 2661, in
It seemes that in my data 2 transcripts are not having in correct frame
.
It would be a great help you could suggest me possible solution for this
Thanks Vinita
Hi @vinitamehlawat
thanks for checking this. Yes, indeed, these 2 transcripts don't have a correct reading frame. Right now, the pipeline expects that each reference transcript satisfies the following criteria:
Otherwise, CESAR (a tool, that realigns reference exons to query loci) might not process such transcripts correctly in the multi-exon mode. Also, post-processing TOGA steps are based on the assumption that the provided reference transcripts have a correct reading frame.
(Maybe, at some point, we will find a way to process a bit more diverse variants, and CESAR itself requires some optimisations) In my case, I just dropped such reference transcripts.
BR, Bogdan
Hi @kirilenkobm
Thank you for your every response. Is there any way to delete these two exons with incomplete reading frames?
Any tool or any command you can suggest, I would really appreciate it.
Best Rgards, Vinita
@vinitamehlawat
I would just do something like
grep -v -e $gene1 -e $gene2 your_bed_file.bed > new_bed_file.bed
or something among these lines.
(I believe deleting whole transcripts would be safer)
Hello, excuse me, I've encountered a similar issue as described above. I've placed the tasks on a CPU computing server, but I'm unable to submit the tasks to the slurm system. Therefore, I opted for "local." However, it has been running for five days now, and the log files indicate that it's still in progress. I'd greatly appreciate any suggestions on how to improve its speed. Thank you very much.
my command is : ./toga.py /opt/synData2/gene_loss/chain/DWR.chain /home/TOGA/TOGAInput/human_hg38/toga.transcripts.bed /home/TOGA/hg38.2bit /opt/synData2/gene_loss/chain/DWR.2bit --kt --pn /opt/synData2/gene_loss/DWR -i /home/TOGA/TOGAInput/human_hg38/toga.isoforms.tsv --nc /home/TOGA/nextflow_config_files --cb 10,100 --cjn 300 --u12 /home/TOGA/TOGAInput/human_hg38/toga.U12introns.tsv --ms -q
The log shows :
'''### STEP 7: Execute CESAR jobs: parallel step'''
Pushing 2 CESAR job lists Pushing memory bucket 10Gb to the executor Selected parallelization strategy: nextflow Parallel manager: pushing job nextflow /home/TOGA/execute_joblist.nf --joblist /opt/synData2/gene_loss/DWR/temp/cesar_joblist_queue_10.txt -c /opt/synData2/gene_loss/DWR/temp/cesar_config_10_queue.nf Pushing memory bucket 100Gb to the executor Selected parallelization strategy: nextflow Parallel manager: pushing job nextflow /home/TOGA/execute_joblist.nf --joblist /opt/synData2/gene_loss/DWR/temp/cesar_joblist_queue_100.txt -c /opt/synData2/gene_loss/DWR/temp/cesar_config_100_queue.nf '''## Stated polling cluster jobs until they done''' Polling iteration 0; already waiting 0 seconds. Polling iteration 1; already waiting 60 seconds. ....... Polling iteration 7882; already waiting 472920 seconds. Polling iteration 7883; already waiting 472980 seconds.
Hi Authors,
I was able to run TOGA on my couple of species, but somehow, after running for 11 hours, I got this error for one of my species in my sbatch error file.
After loading
nextflow
module and installing importlib-metadata usingpip install importlib-metadata
I ran toga with following command./toga.py /home/vlamba/BD-CG.chain /home/vlamba/CG.bed /home/vlamba/CG.2bit /home/vlamba/BD.2bit -i /home/vlamba/CG-isofrom.txt --project_dir /home/vlamba/BD_gene-loss --kt --cb 10,100 --cjn 500 --ms
Cache entry deserialization failed, entry ignored Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-3HJdri/pip/ You are using pip version 8.1.2, however version 23.3.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-msMJKV/importlib-metadata/ You are using pip version 8.1.2, however version 23.3.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. Traceback (most recent call last): File "./toga.py", line 8, in
import importlib.metadata as metadata
ModuleNotFoundError: No module named 'importlib.metadata'
Here are some last lines from my toga .log file
`Polling iteration 653; already waiting 39180 seconds. Polling iteration 654; already waiting 39240 seconds. Polling iteration 655; already waiting 39300 seconds.
CESAR jobs done
Checking whether all CESAR results are complete 2 CESAR jobs crashed, trying to run again... !!RERUN CESAR JOBS: Pushing 2 jobs into 100 GB queue Selected parallelization strategy: nextflow Parallel manager: pushing job nextflow /scrfs/storage/vlamba/home/TOGA/execute_joblist.nf --joblist /home/vlamba/BD_gene-loss/_cesar_rerun_batch_100 Monitoring CESAR jobs rerun
Stated polling cluster jobs until they done
CESAR jobs done
` I would be grateful if you could suggest any solution for this.
My second concern is running time: my first species took 9hr:40min to complete when I ran TOGA for the first time, the second took 10hr:21min, and the third failed after 11 hrs.
Kindly have a look at my shared command and suggest the best way to run this tool faster on my data.
Thank you