parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics
Other
87 stars 19 forks source link

xTea_long getting stuck, no errors, during wtpoa-cns, running HG38 #106

Open vaksmaz opened 2 months ago

vaksmaz commented 2 months ago

I am running the long reads xTea and I am having an issue when running HG38. This was not an issue using T2T reference. When I am running HG38 with the below command, everything runs smoothly until I get to the wtpoa-cns (3500), and somewhere in the middle (every time a different spot) the code stops running anything but is still active. For this run the last thing it ran is the following: -- Starting program: wtpoa-cns -I /path_to/Long_read/COLO829-T/COLO829/tmp/l_asm_tmp/3500/chr3~109529324_3500_wtdbg2.ctg.lay.gz -fo /path_to/Long_read/COLO829-T/COLO829/tmp/l_asm_tmp/3500/chr3~109529324_3500_wtdbg2.ctg.lay.fa -t 16 -- pid 140518 -- date Sun Apr 21 15:05:33 2024

I am checking 15 hours later and it still has not advanced. I have 800 GB of ram, so that can't be it. I am running CentOS Linux 7, in case it is important. Again, it did not have an issue with the new alignment.

The command I am using is below:

SAMPLE_ID=sample_id.txt
BAMS=bam_list.txt WFOLDER=/gpfs/commons/groups/compbio/vaksman/Long_read/COLO829-T/ OUT_SCRTP=submit_jobs_COLO829-T_fast.sh TIME=60000:00 REF=/path_to/GRCh38_1000genomes/GRCh38_full_analysis_set_plus_decoy_hla.fa XTEA=/path_ton/bin/xTea_long/xTea/xtea_long/ RMSK=/path_to/bin/xTea_long/xTea/rep_lib_annotation/LINE/hg38/hg38_L1_larger_500_with_all_L1HS.out CNS_L1=/path_to/bin/xTea_long/xTea/rep_lib_annotation/consensus/LINE1.fa REP_LIB=/path_to/bin/xTea_long/xTea/rep_lib_annotation/ GENE=/path_to/bin/xTea_long/xTea/rep_lib_annotation/GENCODE.v33.annotation.gff3

python ${XTEA}"gnrt_pipeline_local_long_read_v38.py" \ -i ${SAMPLE_ID} \ -b ${BAMS} -p ${WFOLDER} \ -o ${OUT_SCRTP} \ --xtea ${XTEA} \ -n 16 -m 800 -t ${TIME} \ -r ${REF} --rmsk ${RMSK} \ --cns ${CNS_L1} --rep ${REP_LIB} --slurm \ --min 400000 -f 31 -y 15 --clean \ --g {GENE} \ -q bigmem

simoncchu commented 2 months ago

It seems the program hang there. Sometimes the output from the third program (here wtpoa-cns) may block the pipe and cause issue like this. I don't have a good solution right now. Could you re-run and see whether this error replicate?

vaksmaz commented 2 months ago

It is replicated every time I run it. A bit different place but the same process. I ran it multiple times, with different parameters and it has happened every time. But only for HG38, not for T2T reference. Also, with multiple samples.