Closed Shezara7 closed 8 months ago
Hi, for the SVA one, could you run with a larger memory and check the log again?
I give it 50GB and the cluster says it only uses 6. Do I need to give it even more than that?
Could you post the command you run? Including the job submission one.
SAMPLE_ID=sample_id.txt BAMS=sample_bam.txt X10_BAM=null WFOLDER=/PATH/xTeaDemo/ OUT_SCRTP=submit_jobs.sh TIME=0-20:01:00 REP_LIB=/PATH/xTeaDemo/rep_lib_annotation/ GENE=/GENEPATH/Data/gencode.v33.annotation.gff3 REF=/REFPATH/GATK_GRCh38/Homo_sapiens_assembly38.fasta XTEA=/PATH/xTeaDemo/xTea/xtea/ BLK_LIST=/PATH/xTeaDemo/rep_lib_annotation/blacklist/hg38/centromere.bed
python ${XTEA}"gnrt_pipeline_local_v38.py" -i ${SAMPLE_ID} -b ${BAMS} -x ${X10_BAM} -p ${WFOLDER} -o ${OUT_SCRTP} -q short -n 8 -m 50 -t ${TIME} \ -l ${REP_LIB} -r ${REF} -g ${GENE} --xtea ${XTEA} --nclip 4 --cr 2 --nd 5 --nfclip 4 --nfdisc 5 --flklen 3000 -f 5907 -y 7 --blacklist ${BLK_LIST}
I submit with sbatch run_gnrt_pipeline.sh
This is the wrapper script. How do you submit the generated job script?
Sorry about the confusion I've been working on something else recently as well. It's just:
sh run_gnrt_pipeline.sh sh submit_jobs.sh
Do you have the --slurm option added? What's the content of "submit_jobs.sh"?
I do not have the --slurm option but I will add it and try again if you think that will help. "submit_jobs.sh" looks like:
sbatch < /PATH/xTeaDemo/NA12878/L1/run_xTEA_pipeline.sh sbatch < /PATH/xTeaDemo/NA12878/Alu/run_xTEA_pipeline.sh sbatch < /PATH/xTeaDemo/NA12878/SVA/run_xTEA_pipeline.sh
I just want to see what the detailed commends you run. You keep on providing unrelated information...
Specifically, what's the content inside " /PATH/xTeaDemo/NA12878/SVA/run_xTEA_pipeline.sh"
Please remember that I am asking for help specifically because I do not know or understand this pipeline and thus be patient with me.
" /PATH/xTeaDemo/NA12878/SVA/run_xTEA_pipeline.sh" is as follows:
PREFIX=/PATH/xTeaDemo/NA12878/SVA/
############
############
ANNOTATION=/PATH/xTeaDemo/rep_lib_annotation/SVA/hg38/hg38_SVA.out
ANNOTATION1=/PATH/xTeaDemo/rep_lib_annotation/SVA/hg38/hg38_SVA.out
REF=/REFPATH/GATK_GRCh38/Homo_sapiens_assembly38.fasta
GENE=/GENEPATH/Data/gencode.v33.annotation.gff3
BLACK_LIST=/PATH/xTeaDemo/rep_lib_annotation/blacklist/hg38/centromere.bed
L1_COPY_WITH_FLANK=/PATH/xTeaDemo/rep_lib_annotation/SVA/hg38/hg38_SVA_copies_with_flank.fa
SF_FLANK=/PATH/xTeaDemo/rep_lib_annotation/SVA/hg38/hg38_FL_SVA_flanks_3k.fa
L1_CNS=/PATH/xTeaDemo/rep_lib_annotation/consensus/SVA.fa
XTEA_PATH=/PATH/xTeaDemo/xTea/xtea/
BAM_LIST=${PREFIX}"bam_list.txt"
BAM1=${PREFIX}"10X_phased_possorted_bam.bam"
BARCODE_BAM=${PREFIX}"10X_barcode_indexed.sorted.bam"
TMP=${PREFIX}"tmp/"
TMP_CLIP=${PREFIX}"tmp/clip/"
TMP_CNS=${PREFIX}"tmp/cns/"
TMP_TNSD=${PREFIX}"tmp/transduction/"
############
############
python ${XTEA_PATH}"x_TEA_main.py" -C --sva -i ${BAM_LIST} --lc 4 --rc 4 --cr 2 -r ${L1_COPY_WITH_FLANK} -a ${ANNOTATION} --cns ${L1_CNS} --ref ${REF} -p ${TMP} -o ${PREFIX}"candidate_list_from_clip.txt" -n 8 --cp /PATH/xTeaDemo/NA12878/pub_clip/
python ${XTEA_PATH}"x_TEA_main.py" -D --sva -i ${PREFIX}"candidate_list_from_clip.txt" --nd 5 --ref ${REF} -a ${ANNOTATION} -b ${BAM_LIST} -p ${TMP} -o ${PREFIX}"candidate_list_from_disc.txt" -n 8
python ${XTEA_PATH}"x_TEA_main.py" -N --sva --cr 4 --nd 5 -b ${BAM_LIST} -p ${TMP_CNS} --fflank ${SF_FLANK} --flklen 3000 -n 8 -i ${PREFIX}"candidate_list_from_disc.txt" -r ${L1_CNS} --ref ${REF} -a ${ANNOTATION} -o ${PREFIX}"candidate_disc_filtered_cns.txt"
python ${XTEA_PATH}"x_TEA_main.py" --transduction --cr 4 --nd 5 -b ${BAM_LIST} -p ${TMP_TNSD} --fflank ${SF_FLANK} --flklen 3000 -n 8 -i ${PREFIX}"candidate_disc_filtered_cns.txt" -r ${L1_CNS} --ref ${REF} --input2 ${PREFIX}"candidate_list_from_disc.txt.clip_sites_raw_disc.txt" --rtype 4 -a ${ANNOTATION1} -o ${PREFIX}"candidate_disc_filtered_cns2.txt"
python ${XTEA_PATH}"x_TEA_main.py" --sibling --cr 4 --nd 5 -b ${BAM_LIST} -p ${TMP_TNSD} --fflank "" --flklen 3000 -n 8 -i ${PREFIX}"candidate_disc_filtered_cns2.txt" -r ${L1_CNS} --ref ${REF} --input2 ${PREFIX}"candidate_list_from_disc.txt.clip_sites_raw_disc.txt" --rtype 4 -a ${ANNOTATION1} --blacklist ${BLACK_LIST} -o ${PREFIX}"candidate_sibling_transduction2.txt"
python ${XTEA_PATH}"x_TEA_main.py" --postF --rtype 4 -p ${TMP_CNS} -n 8 -i ${PREFIX}"candidate_disc_filtered_cns2.txt" -a ${ANNOTATION1} -o ${PREFIX}"candidate_disc_filtered_cns_post_filtering.txt"
python ${XTEA_PATH}"x_TEA_main.py" --postF --rtype 4 -p ${TMP_CNS} -n 8 -i ${PREFIX}"candidate_disc_filtered_cns2.txt.high_confident" -a ${ANNOTATION1} --blacklist ${BLACK_LIST} -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gene -a ${GENE} -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt" -n 8 -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gntp_classify -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt" -n 1 --model ${XTEA_PATH}"genotyping/DF21_model_1_2" -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gVCF -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt" -o ${PREFIX} -b ${BAM_LIST} --ref ${REF} --rtype 4
python ${XTEA_PATH}"x_TEA_main.py" --igv --single_sample -p ${PREFIX}"tmp/igv" -b ${PREFIX}"bam_list1.txt" -i ${PREFIX}"candidate_disc_filtered_cns.txt" --ref ${REF} -e 1000 -n 8 -o ${PREFIX}"tmp/igv/bamsnap_screenshot.txt"
python ${XTEA_PATH}"x_TEA_main.py" --igv --single_sample -p ${PREFIX}"tmp/igv" -b ${PREFIX}"bam_list1.txt" -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt" --ref ${REF} -e 1000 -n 8 -o ${PREFIX}"tmp/igv/bamsnap_screenshot_hc.txt"
Can you try with a header like this?
replace you_queue_name
to the queue name on your cluster. I don't think it's a software issue, it's how you submit the job matters.
Hi! I spent a little while trying different headers, including adding --slurm, and it does not change much. Here is my current header
When using the v38 pipeline for short read sequences, I get many errors that chromosomes do not exist (e.g. Error happen at merge clip and disc feature step: chr2 not exist), but specifically in the SVA call. In another output like Alu, there are none of that specific error. Any advice as to why this might be occurring?
Output_SVA_a.txt Output_Alu_a.txt