parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics
Other
87 stars 19 forks source link

Long read cannot find temp files #96

Closed Shezara7 closed 3 months ago

Shezara7 commented 4 months ago

Hello! I am trying to run xTea long on a nanopore sequence. Could you help me figure out what the problem is? From looking through these issues it seems most similar to issue #22 except that it does not correctly generate the files you ask for and fails almost immediately.

My batch file looks like this:

#!/bin/bash

#SBATCH -n 16
#SBATCH -t 0-10:01:00
#SBATCH --mem=48G
#SBATCH -p long
#SBATCH -o FILE_%j.out
#SBATCH --mail-type=NONE
#SBATCH --mail-user=chong.simonchu@gmail.com
PREFIX=/PREFIX_PATH/FILE/
############
############
REF=/REF_PATH/GATK_GRCh38/Homo_sapiens_assembly38.fasta
XTEA_PATH=/XTEA_PATH/xTea/xtea_long/
BAM_LIST=${PREFIX}"bam_list.txt"
TMP=${PREFIX}"tmp/"
REP_LIB=/REPLIB_PATH/rep_lib_annotation/
SVA_REF_COPY=null
############
############
python ${XTEA_PATH}"l_main.py" -C -b ${BAM_LIST} -r ${REF} -p ${TMP} -o ${PREFIX}"candidate_list_from_clip.txt"  -n 16 -w 75 
python ${XTEA_PATH}"l_main.py" -A -b ${BAM_LIST} -r ${REF} -p ${TMP} -i ${PREFIX}"candidate_list_from_clip.txt" -o ${PREFIX}"all_ins_seqs.fa" --rep ${REP_LIB} -n 16 
python ${XTEA_PATH}"l_main.py" -N -b ${BAM_LIST} -r ${REF} -p ${TMP}"ghost" -o ${PREFIX}"ghost_reads.fa" --rmsk /REPLIB_PATH/rep_lib_annotation/LINE/hg38_L1_larger_500_with_all_L1HS.out --cns /REPLIB_PATH/rep_lib_annotation/consensus/LINE.fa --min 4000 -n 16
python ${XTEA_PATH}"l_main.py" -Y -i ${PREFIX}"all_ins_seqs.fa" -r ${REF} -p ${TMP}"classification" --rep ${REP_LIB} -y 15 -o ${PREFIX}"classified_results.txt" -n 16
python ${XTEA_PATH}"l_main.py" --clean -b ${BAM_LIST} -r ${REF} -p ${TMP} -i ${PREFIX}"candidate_list_from_clip.txt"  -n 16

The error message looks like this:

...
[ERROR] failed to open file '/PREFIX_PATH/FILE/ghost_reads.fa.separate_flanking.fa': No such file or directory
[main_samview] fail to read the header from "-".
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"
[E::hts_open_format] Failed to open file "/PREFIX_PATH/FILE/ghost_reads.fa.separate_flanking.fa.algn_2_itself.sorted.bam" : No such file or directory
Error: File None doesn't exist!!!
...
Traceback (most recent call last):
  File "/XTEA_PATH/xTea/xtea_long/l_main.py", line 311, in <module>
    lrc.classify_ins_seqs(sf_rep_ins, sf_ref, flk_lenth, sf_rslt)
  File "/XTEA_PATH/xTea/xtea_long/l_rep_classification.py", line 175, in classify_ins_seqs
    self.get_unmasked_seqs(sf_rep_ins_tmp, sf_tmp_out, sf_new_tmp)
  File "/XTEA_PATH/xTea/xtea_long/l_rep_classification.py", line 297, in get_unmasked_seqs
    with pysam.FastxFile(sf_ori) as fin_ori, open(sf_new, "w") as fout_new:
  File "pysam/libcfaidx.pyx", line 551, in pysam.libcfaidx.FastxFile.__cinit__
  File "pysam/libcfaidx.pyx", line 581, in pysam.libcfaidx.FastxFile._open
OSError: file `/PREFIX_PATH/FILE/all_ins_seqs.fa` not found

And these files are generated: all_ins_seqs.fa.sites candidate_list_from_clip.txt.left_breakponts pub_clip tmp bam_list.txt candidate_list_from_clip.txt.right_breakponts run_xTEA_pipeline.sh xtea.config candidate_list_from_clip.txt classified_results.txt.polyA.txt sample_id.txt

Although all but run_xTEA_pipeline.sh, sample_id.txt, and xtea.config are all 0 bytes and thus empty.

Your help would be very much appreciated!

Shezara7 commented 3 months ago

Determined the script was not passing the path to my bam files into the pipeline correctly, so will close this issue.