Hi,
Thank you for your great tools! I am trying to run xTea in long reads sequencing in PACBIO ccs data. And I follow the instructions from the readme. My command is " xtea_long -i sample_id.txt -b long_read_bam_list.txt -p /bastianlab/data1/hpan/xTea -o submit_jobs.sh --rmsk /bastianlab/data1/hpan/xTea/rep_lib_annotation/LINE/hg38/hg38_L1_larger_500_with_all_L1HS.out -r /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa --cns /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus/LINE1.fa --rep /bastianlab/data1/hpan/xTea/rep_lib_annotation --xtea /c4/home/ucsf-pan/software/xTea/xtea_long -f 31 -y 15 -n 8 -m 32 --slurm -q long -t 2-0:0:0"
But error occurs like this:
clip cutoff is: 0
Loaded consensus file list: ['/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/LINE1.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/ALU.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/HERV.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/SVA_ori.fa']
Begin to construct the TE kmer library!
The TE kmer library is constructed/loaded!
[ERROR] failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa': No such file or directory
Traceback (most recent call last):
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 352, in
i_max_clip, i_min_overlap, iset_cutoff, s_cluster_folder)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 111, in cluster_reads_by_flank_region
m_info, m_reads, l_reads = self._parse_self_aligned_reads(sf_algnmt, i_max_clip)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 392, in _parse_self_aligned_reads
samfile = pysam.AlignmentFile(sf_bam, "rb")
File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
Running command: minimap2 -ax asm5 -t 8 /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam -
^CTraceback (most recent call last):
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 311, in
lrc.classify_ins_seqs(sf_rep_ins, sf_ref, flk_lenth, sf_rslt)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 121, in classify_ins_seqs
self.classify_from_ref_algnmt(sf_ref, sf_rep_ins, sf_rslt)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 114, in classify_from_ref_algnmt
xtea_contig.align_contigs_2_reference_genome(sf_ref, sf_rep_ins, self.n_jobs, sf_algnmt)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/x_contig.py", line 106, in align_contigs_2_reference_genome
self.run_cmd(cmd)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/x_contig.py", line 47, in run_cmd
self.cmd_runner.run_cmd_small_output(cmd)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/cmd_runner.py", line 13, in run_cmd_small_output
subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE).communicate()
File "/c4/home/ucsf-pan/miniconda3/envs/svim/lib/python3.7/subprocess.py", line 951, in communicate
stdout = self.stdout.read()
KeyboardInterrupt
(svim) [ucsf-pan@c4-n25 MaMel-144al]$ sh run_xTEA_pipeline.sh
Ave coverage is 0: using parameters clip with value 1
Ave coverage is 0: using parameters clip with value 1
clip cutoff is: 0
Loaded consensus file list: ['/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/LINE1.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/ALU.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/HERV.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/SVA_ori.fa']
Begin to construct the TE kmer library!
[ERROR] failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa': No such file or directory
Traceback (most recent call last):
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 352, in
i_max_clip, i_min_overlap, iset_cutoff, s_cluster_folder)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 111, in cluster_reads_by_flank_region
m_info, m_reads, l_reads = self._parse_self_aligned_reads(sf_algnmt, i_max_clip)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 392, in _parse_self_aligned_reads
samfile = pysam.AlignmentFile(sf_bam, "rb")
File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
Running command: minimap2 -ax asm5 -t 8 /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam -
[M::mm_idx_gen::53.8711.58] collected minimizers
[M::mm_idx_gen::75.4591.80] sorted minimizers
[M::main::75.4591.80] loaded/built the index for 455 target sequence(s)
[M::mm_mapopt_update::102.0871.59] mid_occ = 144
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 455
[M::mm_idx_stat::106.027*1.57] distinct minimizers: 214834535 (90.55% are singletons); average occurrences: 1.424; average spacing: 10.491; total length: 3209286105
ERROR: failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa': No such file or directory
ERROR: failed to map the query file
Running command: samtools index /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam
Working on polyA with contigs /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa and consensus /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa
Running command: minimap2 -k11 -w5 --sr --frag=yes -A2 -B4 -O4,8 -E2,1 -r150 -p.5 -N5 -n1 -m20 -s30 -g200 -2K50m --MD --heap-sort=yes --secondary=no --cs -a -t 8 /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/polyA_cns.bam -
[M::mm_idx_gen::0.0021.85] collected minimizers
[M::mm_idx_gen::0.0033.76] sorted minimizers
[M::main::0.0033.73] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.0033.69] mid_occ = 219
[M::mm_idx_stat] kmer size: 11; skip: 5; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.003*3.65] distinct minimizers: 1 (0.00% are singletons); average occurrences: 218.000; average spacing: 1.050; total length: 229
ERROR: failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa': No such file or directory
ERROR: failed to map the query file
Running command: samtools index /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/polyA_cns.bam
Traceback (most recent call last):
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 311, in
lrc.classify_ins_seqs(sf_rep_ins, sf_ref, flk_lenth, sf_rslt)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 175, in classify_ins_seqs
self.get_unmasked_seqs(sf_rep_ins_tmp, sf_tmp_out, sf_new_tmp)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 297, in get_unmasked_seqs
with pysam.FastxFile(sf_ori) as fin_ori, open(sf_new, "w") as fout_new:
File "pysam/libcfaidx.pyx", line 550, in pysam.libcfaidx.FastxFile.cinit
File "pysam/libcfaidx.pyx", line 580, in pysam.libcfaidx.FastxFile._open
OSError: file /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa not found
I have no idea how to solve this. Could you please guide me how to solve this?
Hi, Thank you for your great tools! I am trying to run xTea in long reads sequencing in PACBIO ccs data. And I follow the instructions from the readme. My command is " xtea_long -i sample_id.txt -b long_read_bam_list.txt -p /bastianlab/data1/hpan/xTea -o submit_jobs.sh --rmsk /bastianlab/data1/hpan/xTea/rep_lib_annotation/LINE/hg38/hg38_L1_larger_500_with_all_L1HS.out -r /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa --cns /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus/LINE1.fa --rep /bastianlab/data1/hpan/xTea/rep_lib_annotation --xtea /c4/home/ucsf-pan/software/xTea/xtea_long -f 31 -y 15 -n 8 -m 32 --slurm -q long -t 2-0:0:0"
But error occurs like this: clip cutoff is: 0 Loaded consensus file list: ['/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/LINE1.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/ALU.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/HERV.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/SVA_ori.fa'] Begin to construct the TE kmer library! The TE kmer library is constructed/loaded!
Error: File None doesn't exist!!!
Running command: minimap2 -x ava-pb -c -a -t 8 /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa.algn_2_itself.sorted.bam -
[ERROR] failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa': No such file or directory Traceback (most recent call last): File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 352, in
i_max_clip, i_min_overlap, iset_cutoff, s_cluster_folder)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 111, in cluster_reads_by_flank_region
m_info, m_reads, l_reads = self._parse_self_aligned_reads(sf_algnmt, i_max_clip)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 392, in _parse_self_aligned_reads
samfile = pysam.AlignmentFile(sf_bam, "rb")
File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
Running command: minimap2 -ax asm5 -t 8 /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam -
^CTraceback (most recent call last): File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 311, in
lrc.classify_ins_seqs(sf_rep_ins, sf_ref, flk_lenth, sf_rslt)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 121, in classify_ins_seqs
self.classify_from_ref_algnmt(sf_ref, sf_rep_ins, sf_rslt)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 114, in classify_from_ref_algnmt
xtea_contig.align_contigs_2_reference_genome(sf_ref, sf_rep_ins, self.n_jobs, sf_algnmt)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/x_contig.py", line 106, in align_contigs_2_reference_genome
self.run_cmd(cmd)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/x_contig.py", line 47, in run_cmd
self.cmd_runner.run_cmd_small_output(cmd)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/cmd_runner.py", line 13, in run_cmd_small_output
subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE).communicate()
File "/c4/home/ucsf-pan/miniconda3/envs/svim/lib/python3.7/subprocess.py", line 951, in communicate
stdout = self.stdout.read()
KeyboardInterrupt
(svim) [ucsf-pan@c4-n25 MaMel-144al]$ sh run_xTEA_pipeline.sh
Ave coverage is 0: using parameters clip with value 1
Ave coverage is 0: using parameters clip with value 1
clip cutoff is: 0 Loaded consensus file list: ['/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/LINE1.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/ALU.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/HERV.fa', '/bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/SVA_ori.fa'] Begin to construct the TE kmer library!
The TE kmer library is constructed/loaded!
Error: File None doesn't exist!!!
Running command: minimap2 -x ava-pb -c -a -t 8 /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa.algn_2_itself.sorted.bam -
[ERROR] failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/ghost_reads.fa.separate_flanking.fa': No such file or directory Traceback (most recent call last): File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 352, in
i_max_clip, i_min_overlap, iset_cutoff, s_cluster_folder)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 111, in cluster_reads_by_flank_region
m_info, m_reads, l_reads = self._parse_self_aligned_reads(sf_algnmt, i_max_clip)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_ghost_TE.py", line 392, in _parse_self_aligned_reads
samfile = pysam.AlignmentFile(sf_bam, "rb")
File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
Running command: minimap2 -ax asm5 -t 8 /bastianlab/data1/Shared_datasets/Database/References/ucsc_hg38.bwa-index/hg38.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam -
[M::mm_idx_gen::53.8711.58] collected minimizers [M::mm_idx_gen::75.4591.80] sorted minimizers [M::main::75.4591.80] loaded/built the index for 455 target sequence(s) [M::mm_mapopt_update::102.0871.59] mid_occ = 144 [M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 455 [M::mm_idx_stat::106.027*1.57] distinct minimizers: 214834535 (90.55% are singletons); average occurrences: 1.424; average spacing: 10.491; total length: 3209286105 ERROR: failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa': No such file or directory ERROR: failed to map the query file Running command: samtools index /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/all_tei_seq_2_ref.bam Working on polyA with contigs /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa and consensus /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa
Running command: minimap2 -k11 -w5 --sr --frag=yes -A2 -B4 -O4,8 -E2,1 -r150 -p.5 -N5 -n1 -m20 -s30 -g200 -2K50m --MD --heap-sort=yes --secondary=no --cs -a -t 8 /bastianlab/data1/hpan/xTea/rep_lib_annotation/consensus_mask_lrd/polyA.fa /bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa | samtools view -hSb - | samtools sort -o /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/polyA_cns.bam - [M::mm_idx_gen::0.0021.85] collected minimizers [M::mm_idx_gen::0.0033.76] sorted minimizers [M::main::0.0033.73] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0033.69] mid_occ = 219 [M::mm_idx_stat] kmer size: 11; skip: 5; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.003*3.65] distinct minimizers: 1 (0.00% are singletons); average occurrences: 218.000; average spacing: 1.050; total length: 229 ERROR: failed to open file '/bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa': No such file or directory ERROR: failed to map the query file Running command: samtools index /bastianlab/data1/hpan/xTea/MaMel-144al/tmp/classification/polyA_cns.bam Traceback (most recent call last): File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_main.py", line 311, in
lrc.classify_ins_seqs(sf_rep_ins, sf_ref, flk_lenth, sf_rslt)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 175, in classify_ins_seqs
self.get_unmasked_seqs(sf_rep_ins_tmp, sf_tmp_out, sf_new_tmp)
File "/c4/home/ucsf-pan/software/xTea/xtea_long/l_rep_classification.py", line 297, in get_unmasked_seqs
with pysam.FastxFile(sf_ori) as fin_ori, open(sf_new, "w") as fout_new:
File "pysam/libcfaidx.pyx", line 550, in pysam.libcfaidx.FastxFile.cinit
File "pysam/libcfaidx.pyx", line 580, in pysam.libcfaidx.FastxFile._open
OSError: file
/bastianlab/data1/hpan/xTea/MaMel-144al/all_ins_seqs.fa
not foundI have no idea how to solve this. Could you please guide me how to solve this?