hillerlab / TOGA

TOGA (Tool to infer Orthologs from Genome Alignments): implements a novel paradigm to infer orthologous genes. TOGA integrates gene annotation, inferring orthologs and classifying genes as intact or lost.
MIT License
160 stars 23 forks source link

Cache entry deserialization failed, entry ignored #140

Open vinitamehlawat opened 9 months ago

vinitamehlawat commented 9 months ago

Hi Authors,

I was able to run TOGA on my couple of species, but somehow, after running for 11 hours, I got this error for one of my species in my sbatch error file.

After loading nextflow module and installing importlib-metadata using pip install importlib-metadata I ran toga with following command

./toga.py /home/vlamba/BD-CG.chain /home/vlamba/CG.bed /home/vlamba/CG.2bit /home/vlamba/BD.2bit -i /home/vlamba/CG-isofrom.txt --project_dir /home/vlamba/BD_gene-loss --kt --cb 10,100 --cjn 500 --ms

Cache entry deserialization failed, entry ignored Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-3HJdri/pip/ You are using pip version 8.1.2, however version 23.3.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-msMJKV/importlib-metadata/ You are using pip version 8.1.2, however version 23.3.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. Traceback (most recent call last): File "./toga.py", line 8, in import importlib.metadata as metadata ModuleNotFoundError: No module named 'importlib.metadata'

Here are some last lines from my toga .log file

`Polling iteration 653; already waiting 39180 seconds. Polling iteration 654; already waiting 39240 seconds. Polling iteration 655; already waiting 39300 seconds.

CESAR jobs done

Checking whether all CESAR results are complete 2 CESAR jobs crashed, trying to run again... !!RERUN CESAR JOBS: Pushing 2 jobs into 100 GB queue Selected parallelization strategy: nextflow Parallel manager: pushing job nextflow /scrfs/storage/vlamba/home/TOGA/execute_joblist.nf --joblist /home/vlamba/BD_gene-loss/_cesar_rerun_batch_100 Monitoring CESAR jobs rerun

Stated polling cluster jobs until they done

CESAR jobs done

` I would be grateful if you could suggest any solution for this.

My second concern is running time: my first species took 9hr:40min to complete when I ran TOGA for the first time, the second took 10hr:21min, and the third failed after 11 hrs.

Kindly have a look at my shared command and suggest the best way to run this tool faster on my data.

Thank you

kirilenkobm commented 9 months ago

Hi @vinitamehlawat

Thank you for reaching out. Firstly, I consider a total runtime of around 12 hours to be quite normal. I would start to worry if it takes more than a couple of days The issue is that CESAR2.0 isn't very memory-efficient when dealing with long genes. As far as I know, improvements are being developed in the lab, although I haven't worked there for a couple of years. As I can see, CESAR/TOGA failed to process a couple of genes. Pls check, what exactly is in the /home/vlamba/BD_gene-loss/_cesar_rerun_batch_100 file - what are the failed transcripts.

BR, Bogdan

kirilenkobm commented 9 months ago

https://github.com/hillerlab/TOGA/commit/6ebd86f7c4cf10c30f19ef68976a8df05a5abd7c

vinitamehlawat commented 9 months ago

Hi @kirilenkobm

Thank you very much for your response. Following your suggestion, I did cat _cesar_rerun_batch_100 and got following message for two rejected logs:

/scrfs/storage/vlamba/home/TOGA/cesar_runner.py /home/vlamba/BD-gene-loss/RERUN_CESAR_JOBS/rerun_job_1_100 /home/vlamba/BD-gene-loss/temp/cesar_results/rerun_job_1_100.txt --check_loss /home/vlamba/BD-gene-loss/temp/inact_mut_data/rerun_job_1_100.txt --rejected_log /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_1_100.txt /scrfs/storage/vlamba/home/TOGA/cesar_runner.py /home/vlamba/BD-gene-loss/RERUN_CESAR_JOBS/rerun_job_2_100 /home/vlamba/BD-gene-loss/temp/cesar_results/rerun_job_2_100.txt --check_loss /home/vlamba/BD-gene-loss/temp/inact_mut_data/rerun_job_2_100.txt --rejected_log /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_2_100.txt

and the looked at rejected_log in /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_1_100.txt and it gave me following massage:

/scrfs/storage/vlamba/home/TOGA/CESAR_wrapper.py ENSCGOT00000028426.1 28 /home/vlamba/BD-gene-loss/temp/toga_filt_ref_annot.hdf5 /home/vlamba/BD-gene-loss/temp/genome_alignment.bst /home/vlamba/LepNud-Chain-alignment/CG.2bit /home/vlamba/BD.2bit --cesar_binary /scrfs/storage/vlamba/home/TOGA/CESAR2.0/cesar --uhq_flank 50 --temp_dir /home/vlamba/BD-gene-loss/temp/cesar_temp_files --mask_stops --check_loss --alt_frame_del --memlim 10 CESAR JOB FAILURE Input is corrupted! Reference sequence should start with ATG! Error! CESAR output is corrupted, target must start with ATG! Error! CESAR output is corrupted, target must start with ATG! Traceback (most recent call last): File "/scrfs/storage/vlamba/home/TOGA/CESAR_wrapper.py", line 2661, in realign_exons(cmd_args) File "/scrfs/storage/vlamba/home/TOGA/CESAR_wrapper.py", line 2626, in realign_exons loss_report, del_mis_exons = inact_mut_check( File "/scrfs/storage/vlamba/home/TOGA/modules/inact_mut_check.py", line 1660, in inact_mut_check split_stop_codons = detect_split_stops( File "/scrfs/storage/vlamba/home/TOGA/modules/inact_mut_check.py", line 1468, in detect_split_stops position = exon_to_last_codon_of_exon[first_exon] KeyError: 1

It seemes that in my data 2 transcripts are not having in correct frame.

It would be a great help you could suggest me possible solution for this

Thanks Vinita

kirilenkobm commented 9 months ago

Hi @vinitamehlawat

thanks for checking this. Yes, indeed, these 2 transcripts don't have a correct reading frame. Right now, the pipeline expects that each reference transcript satisfies the following criteria:

Otherwise, CESAR (a tool, that realigns reference exons to query loci) might not process such transcripts correctly in the multi-exon mode. Also, post-processing TOGA steps are based on the assumption that the provided reference transcripts have a correct reading frame.

(Maybe, at some point, we will find a way to process a bit more diverse variants, and CESAR itself requires some optimisations) In my case, I just dropped such reference transcripts.

BR, Bogdan

vinitamehlawat commented 9 months ago

Hi @kirilenkobm

Thank you for your every response. Is there any way to delete these two exons with incomplete reading frames?

Any tool or any command you can suggest, I would really appreciate it.

Best Rgards, Vinita

kirilenkobm commented 9 months ago

@vinitamehlawat

I would just do something like grep -v -e $gene1 -e $gene2 your_bed_file.bed > new_bed_file.bed or something among these lines. (I believe deleting whole transcripts would be safer)

molinfzlvvv commented 5 months ago

Hello, excuse me, I've encountered a similar issue as described above. I've placed the tasks on a CPU computing server, but I'm unable to submit the tasks to the slurm system. Therefore, I opted for "local." However, it has been running for five days now, and the log files indicate that it's still in progress. I'd greatly appreciate any suggestions on how to improve its speed. Thank you very much.

my command is : ./toga.py /opt/synData2/gene_loss/chain/DWR.chain /home/TOGA/TOGAInput/human_hg38/toga.transcripts.bed /home/TOGA/hg38.2bit /opt/synData2/gene_loss/chain/DWR.2bit --kt --pn /opt/synData2/gene_loss/DWR -i /home/TOGA/TOGAInput/human_hg38/toga.isoforms.tsv --nc /home/TOGA/nextflow_config_files --cb 10,100 --cjn 300 --u12 /home/TOGA/TOGAInput/human_hg38/toga.U12introns.tsv --ms -q

The log shows :

'''### STEP 7: Execute CESAR jobs: parallel step'''

Pushing 2 CESAR job lists Pushing memory bucket 10Gb to the executor Selected parallelization strategy: nextflow Parallel manager: pushing job nextflow /home/TOGA/execute_joblist.nf --joblist /opt/synData2/gene_loss/DWR/temp/cesar_joblist_queue_10.txt -c /opt/synData2/gene_loss/DWR/temp/cesar_config_10_queue.nf Pushing memory bucket 100Gb to the executor Selected parallelization strategy: nextflow Parallel manager: pushing job nextflow /home/TOGA/execute_joblist.nf --joblist /opt/synData2/gene_loss/DWR/temp/cesar_joblist_queue_100.txt -c /opt/synData2/gene_loss/DWR/temp/cesar_config_100_queue.nf '''## Stated polling cluster jobs until they done''' Polling iteration 0; already waiting 0 seconds. Polling iteration 1; already waiting 60 seconds. ....... Polling iteration 7882; already waiting 472920 seconds. Polling iteration 7883; already waiting 472980 seconds.