parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics
Other
87 stars 19 forks source link

Merge clip and disc step error #92

Closed Shezara7 closed 8 months ago

Shezara7 commented 8 months ago

When using the v38 pipeline for short read sequences, I get many errors that chromosomes do not exist (e.g. Error happen at merge clip and disc feature step: chr2 not exist), but specifically in the SVA call. In another output like Alu, there are none of that specific error. Any advice as to why this might be occurring?

Output_SVA_a.txt Output_Alu_a.txt

simoncchu commented 8 months ago

Hi, for the SVA one, could you run with a larger memory and check the log again?

Shezara7 commented 8 months ago

I give it 50GB and the cluster says it only uses 6. Do I need to give it even more than that?

simoncchu commented 8 months ago

Could you post the command you run? Including the job submission one.

Shezara7 commented 8 months ago

!/bin/bash

SAMPLE_ID=sample_id.txt BAMS=sample_bam.txt X10_BAM=null WFOLDER=/PATH/xTeaDemo/ OUT_SCRTP=submit_jobs.sh TIME=0-20:01:00 REP_LIB=/PATH/xTeaDemo/rep_lib_annotation/ GENE=/GENEPATH/Data/gencode.v33.annotation.gff3 REF=/REFPATH/GATK_GRCh38/Homo_sapiens_assembly38.fasta XTEA=/PATH/xTeaDemo/xTea/xtea/ BLK_LIST=/PATH/xTeaDemo/rep_lib_annotation/blacklist/hg38/centromere.bed

python ${XTEA}"gnrt_pipeline_local_v38.py" -i ${SAMPLE_ID} -b ${BAMS} -x ${X10_BAM} -p ${WFOLDER} -o ${OUT_SCRTP} -q short -n 8 -m 50 -t ${TIME} \ -l ${REP_LIB} -r ${REF} -g ${GENE} --xtea ${XTEA} --nclip 4 --cr 2 --nd 5 --nfclip 4 --nfdisc 5 --flklen 3000 -f 5907 -y 7 --blacklist ${BLK_LIST}

I submit with sbatch run_gnrt_pipeline.sh

simoncchu commented 8 months ago

This is the wrapper script. How do you submit the generated job script?

Shezara7 commented 8 months ago

Sorry about the confusion I've been working on something else recently as well. It's just:

!/bin/bash

sh run_gnrt_pipeline.sh sh submit_jobs.sh

simoncchu commented 8 months ago

Do you have the --slurm option added? What's the content of "submit_jobs.sh"?

Shezara7 commented 8 months ago

I do not have the --slurm option but I will add it and try again if you think that will help. "submit_jobs.sh" looks like:

!/bin/bash

sbatch < /PATH/xTeaDemo/NA12878/L1/run_xTEA_pipeline.sh sbatch < /PATH/xTeaDemo/NA12878/Alu/run_xTEA_pipeline.sh sbatch < /PATH/xTeaDemo/NA12878/SVA/run_xTEA_pipeline.sh

simoncchu commented 8 months ago

I just want to see what the detailed commends you run. You keep on providing unrelated information...

Specifically, what's the content inside " /PATH/xTeaDemo/NA12878/SVA/run_xTEA_pipeline.sh"

Shezara7 commented 8 months ago

Please remember that I am asking for help specifically because I do not know or understand this pipeline and thus be patient with me.

" /PATH/xTeaDemo/NA12878/SVA/run_xTEA_pipeline.sh" is as follows:

!/bin/bash

SBATCH -t 0-20:01:00

SBATCH --mem=50g

PREFIX=/PATH/xTeaDemo/NA12878/SVA/ ############ ############ ANNOTATION=/PATH/xTeaDemo/rep_lib_annotation/SVA/hg38/hg38_SVA.out ANNOTATION1=/PATH/xTeaDemo/rep_lib_annotation/SVA/hg38/hg38_SVA.out REF=/REFPATH/GATK_GRCh38/Homo_sapiens_assembly38.fasta GENE=/GENEPATH/Data/gencode.v33.annotation.gff3 BLACK_LIST=/PATH/xTeaDemo/rep_lib_annotation/blacklist/hg38/centromere.bed L1_COPY_WITH_FLANK=/PATH/xTeaDemo/rep_lib_annotation/SVA/hg38/hg38_SVA_copies_with_flank.fa SF_FLANK=/PATH/xTeaDemo/rep_lib_annotation/SVA/hg38/hg38_FL_SVA_flanks_3k.fa L1_CNS=/PATH/xTeaDemo/rep_lib_annotation/consensus/SVA.fa XTEA_PATH=/PATH/xTeaDemo/xTea/xtea/ BAM_LIST=${PREFIX}"bam_list.txt" BAM1=${PREFIX}"10X_phased_possorted_bam.bam" BARCODE_BAM=${PREFIX}"10X_barcode_indexed.sorted.bam" TMP=${PREFIX}"tmp/" TMP_CLIP=${PREFIX}"tmp/clip/" TMP_CNS=${PREFIX}"tmp/cns/" TMP_TNSD=${PREFIX}"tmp/transduction/" ############ ############ python ${XTEA_PATH}"x_TEA_main.py" -C --sva -i ${BAM_LIST} --lc 4 --rc 4 --cr 2 -r ${L1_COPY_WITH_FLANK} -a ${ANNOTATION} --cns ${L1_CNS} --ref ${REF} -p ${TMP} -o ${PREFIX}"candidate_list_from_clip.txt" -n 8 --cp /PATH/xTeaDemo/NA12878/pub_clip/
python ${XTEA_PATH}"x_TEA_main.py" -D --sva -i ${PREFIX}"candidate_list_from_clip.txt" --nd 5 --ref ${REF} -a ${ANNOTATION} -b ${BAM_LIST} -p ${TMP} -o ${PREFIX}"candidate_list_from_disc.txt" -n 8
python ${XTEA_PATH}"x_TEA_main.py" -N --sva --cr 4 --nd 5 -b ${BAM_LIST} -p ${TMP_CNS} --fflank ${SF_FLANK} --flklen 3000 -n 8 -i ${PREFIX}"candidate_list_from_disc.txt" -r ${L1_CNS} --ref ${REF} -a ${ANNOTATION} -o ${PREFIX}"candidate_disc_filtered_cns.txt"
python ${XTEA_PATH}"x_TEA_main.py" --transduction --cr 4 --nd 5 -b ${BAM_LIST} -p ${TMP_TNSD} --fflank ${SF_FLANK} --flklen 3000 -n 8 -i ${PREFIX}"candidate_disc_filtered_cns.txt" -r ${L1_CNS} --ref ${REF} --input2 ${PREFIX}"candidate_list_from_disc.txt.clip_sites_raw_disc.txt" --rtype 4 -a ${ANNOTATION1} -o ${PREFIX}"candidate_disc_filtered_cns2.txt" python ${XTEA_PATH}"x_TEA_main.py" --sibling --cr 4 --nd 5 -b ${BAM_LIST} -p ${TMP_TNSD} --fflank "" --flklen 3000 -n 8 -i ${PREFIX}"candidate_disc_filtered_cns2.txt" -r ${L1_CNS} --ref ${REF} --input2 ${PREFIX}"candidate_list_from_disc.txt.clip_sites_raw_disc.txt" --rtype 4 -a ${ANNOTATION1} --blacklist ${BLACK_LIST} -o ${PREFIX}"candidate_sibling_transduction2.txt" python ${XTEA_PATH}"x_TEA_main.py" --postF --rtype 4 -p ${TMP_CNS} -n 8 -i ${PREFIX}"candidate_disc_filtered_cns2.txt" -a ${ANNOTATION1} -o ${PREFIX}"candidate_disc_filtered_cns_post_filtering.txt" python ${XTEA_PATH}"x_TEA_main.py" --postF --rtype 4 -p ${TMP_CNS} -n 8 -i ${PREFIX}"candidate_disc_filtered_cns2.txt.high_confident" -a ${ANNOTATION1} --blacklist ${BLACK_LIST} -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt" python ${XTEA_PATH}"x_TEA_main.py" --gene -a ${GENE} -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt" -n 8 -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt" python ${XTEA_PATH}"x_TEA_main.py" --gntp_classify -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt" -n 1 --model ${XTEA_PATH}"genotyping/DF21_model_1_2" -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt" python ${XTEA_PATH}"x_TEA_main.py" --gVCF -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt" -o ${PREFIX} -b ${BAM_LIST} --ref ${REF} --rtype 4 python ${XTEA_PATH}"x_TEA_main.py" --igv --single_sample -p ${PREFIX}"tmp/igv" -b ${PREFIX}"bam_list1.txt" -i ${PREFIX}"candidate_disc_filtered_cns.txt" --ref ${REF} -e 1000 -n 8 -o ${PREFIX}"tmp/igv/bamsnap_screenshot.txt" python ${XTEA_PATH}"x_TEA_main.py" --igv --single_sample -p ${PREFIX}"tmp/igv" -b ${PREFIX}"bam_list1.txt" -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt" --ref ${REF} -e 1000 -n 8 -o ${PREFIX}"tmp/igv/bamsnap_screenshot_hc.txt"

simoncchu commented 8 months ago

Can you try with a header like this?

!/bin/bash

SBATCH -c 8

SBATCH -t 1-7:00

SBATCH --mem=32G

SBATCH -p you_queue_name

SBATCH -o test_%j.out

SBATCH --mail-type=NONE

replace you_queue_name to the queue name on your cluster. I don't think it's a software issue, it's how you submit the job matters.

Shezara7 commented 8 months ago

Hi! I spent a little while trying different headers, including adding --slurm, and it does not change much. Here is my current header

SBATCH -c 8

SBATCH -t 0-20:01:00

SBATCH --mem=50G

SBATCH -p general

SBATCH -o NA12878_%j.out

SBATCH --mail-type=NONE

SBATCH --mail-user=chong.simonchu@gmail.com