parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics
Other
102 stars 23 forks source link

ghost_reads.fa.separate_flanking.fa': No such file or directory #127

Open ucsfpan opened 2 months ago

ucsfpan commented 2 months ago

Hi, Thank you for your great tools! I am trying to run xTea in long reads sequencing in PACBIO ccs data. And I follow the instructions from the readme. My command is " xtea_long -i sample_id.txt -b long_read_bam_list.txt -p /bastianlab/data1/hpan/xTea -o submit_jobs.sh --rmsk /bastianlab/data1/hpan/xTea/rep_lib_annotation/LINE/hg38/hg38_L1_larger_500_with_all_L1HS.out -r /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa --cns /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus/LINE1.fa --rep /bastianlab/data1/hpan/xTea/rep_lib_annotation --xtea /c4/home/ucsf-pan/software/xTea/xtea_long -f 31 -y 15 -n 8 -m 32 --slurm -q long -t 2-0:0:0"

But error occurs like this: clip cutoff is: 0 Loaded consensus file list: ['/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/LINE1.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/ALU.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/HERV.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/SVA_ori.fa'] Begin to construct the TE kmer library! The TE kmer library is constructed/loaded!

Error: File None doesn't exist!!!

Running command: minimap2 -x ava-pb -c -a -t 8 /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa.algn_2_itself.sorted.bam -

[ERROR] failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa': No such file or directory Traceback (most recent call last): File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 352, in i_max_clip, i_min_overlap, iset_cutoff, s_cluster_folder) File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 111, in cluster_reads_by_flank_region m_info, m_reads, l_reads = self._parse_self_aligned_reads(sf_algnmt, i_max_clip) File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 392, in _parse_self_aligned_reads samfile = pysam.AlignmentFile(sf_bam, "rb") File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False Running command: minimap2 -ax asm5 -t 8 /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam -

^CTraceback (most recent call last): File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 311, in lrc.classify_ins_seqs(sf_rep_ins, sf_ref, flk_lenth, sf_rslt) File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 121, in classify_ins_seqs self.classify_from_ref_algnmt(sf_ref, sf_rep_ins, sf_rslt) File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 114, in classify_from_ref_algnmt xtea_contig.align_contigs_2_reference_genome(sf_ref, sf_rep_ins, self.n_jobs, sf_algnmt) File "/c4/home/ucsf-pan/software/xTea/xtea_long/x_contig.py", line 106, in align_contigs_2_reference_genome self.run_cmd(cmd) File "/c4/home/ucsf-pan/software/xTea/xtea_long/x_contig.py", line 47, in run_cmd self.cmd_runner.run_cmd_small_output(cmd) File "/c4/home/ucsf-pan/software/xTea/xtea_long/cmd_runner.py", line 13, in run_cmd_small_output subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE).communicate() File "/c4/home/ucsf-pan/miniconda3/envs/svim/lib/python3.7/subprocess.py", line 951, in communicate stdout = self.stdout.read() KeyboardInterrupt (svim) [ucsf-pan@c4-n25 MaMel-144al]$ sh run_xTEA_pipeline.sh Ave coverage is 0: using parameters clip with value 1

Ave coverage is 0: using parameters clip with value 1

clip cutoff is: 0 Loaded consensus file list: ['/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/LINE1.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/ALU.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/HERV.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/SVA_ori.fa'] Begin to construct the TE kmer library!

The TE kmer library is constructed/loaded!

Error: File None doesn't exist!!!

Running command: minimap2 -x ava-pb -c -a -t 8 /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa.algn_2_itself.sorted.bam -

[ERROR] failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa': No such file or directory Traceback (most recent call last): File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 352, in i_max_clip, i_min_overlap, iset_cutoff, s_cluster_folder) File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 111, in cluster_reads_by_flank_region m_info, m_reads, l_reads = self._parse_self_aligned_reads(sf_algnmt, i_max_clip) File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 392, in _parse_self_aligned_reads samfile = pysam.AlignmentFile(sf_bam, "rb") File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False Running command: minimap2 -ax asm5 -t 8 /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam -

[M::mm_idx_gen::53.8711.58] collected minimizers [M::mm_idx_gen::75.4591.80] sorted minimizers [M::main::75.4591.80] loaded/built the index for 455 target sequence(s) [M::mm_mapopt_update::102.0871.59] mid_occ = 144 [M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 455 [M::mm_idx_stat::106.027*1.57] distinct minimizers: 214834535 (90.55% are singletons); average occurrences: 1.424; average spacing: 10.491; total length: 3209286105 ERROR: failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa': No such file or directory ERROR: failed to map the query file Running command: samtools index /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam Working on polyA with contigs /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa and consensus /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa

Running command: minimap2 -k11 -w5 --sr --frag=yes -A2 -B4 -O4,8 -E2,1 -r150 -p.5 -N5 -n1 -m20 -s30 -g200 -2K50m --MD --heap-sort=yes --secondary=no --cs -a -t 8 /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/polyA_cns.bam - [M::mm_idx_gen::0.0021.85] collected minimizers [M::mm_idx_gen::0.0033.76] sorted minimizers [M::main::0.0033.73] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0033.69] mid_occ = 219 [M::mm_idx_stat] kmer size: 11; skip: 5; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.003*3.65] distinct minimizers: 1 (0.00% are singletons); average occurrences: 218.000; average spacing: 1.050; total length: 229 ERROR: failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa': No such file or directory ERROR: failed to map the query file Running command: samtools index /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/polyA_cns.bam Traceback (most recent call last): File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 311, in lrc.classify_ins_seqs(sf_rep_ins, sf_ref, flk_lenth, sf_rslt) File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 175, in classify_ins_seqs self.get_unmasked_seqs(sf_rep_ins_tmp, sf_tmp_out, sf_new_tmp) File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 297, in get_unmasked_seqs with pysam.FastxFile(sf_ori) as fin_ori, open(sf_new, "w") as fout_new: File "pysam/libcfaidx.pyx", line 550, in pysam.libcfaidx.FastxFile.cinit File "pysam/libcfaidx.pyx", line 580, in pysam.libcfaidx.FastxFile._open OSError: file /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa not found

I have no idea how to solve this. Could you please guide me how to solve this?