hall-lab / speedseq

A flexible framework for rapid genome analysis and interpretation
MIT License
311 stars 116 forks source link

mismatched line lengths at line 3 within sequence #147

Open vivekruhela opened 3 years ago

vivekruhela commented 3 years ago

I am trying to get somatic mutations using speedseq somatic function. But everytime I am getting an error of mismatch line length. I am using hg19 fasta file from UCSC. I have also tried the referencce file mentuoned in speedseq readme file (human_g1k_v37.fasta.gz) but still getting the same error. The command and detailed error messaafe is shown below:

/home/akansha/speedseq/bin/speedseq somatic /home/akansha/vivekruhela/refs/hg19/ucsc.hg19.fasta /home/akansha/vivekruhela/ega_data_1901/bam_files/CR-MGUS-10_10/CR-MGUS-10_10-PB_dedup.realigned.bam /home/akansha/vivekruhela/ega_data_1901/bam_files/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup.realigned.bam -o /home/akansha/vivekruhela/ega_data_1901/speedseq_analysis/test

Error Message:

Sourcing executables from /home/akansha/speedseq/bin/speedseq.config ...
Calling somatic variants...

    create temporary directory

    /home/akansha/speedseq//bin/sambamba view -H /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-PB_dedup_filtered.bam | grep "^@SQ" | cut -f 2- | awk '{ gsub("^SN:","",$1); gsub("^LN:","",$2); print $1"\t0\t"$2; }' > CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM/windows.bed

    /home/akansha/speedseq//bin/freebayes -f /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam \
        --pooled-discrete \
        --min-repeat-entropy 1 \
        --genotype-qualities \
        --min-alternate-fraction 0.05 \
        --min-alternate-count 2 \
        --region $chrom:$start..$end \
        /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-PB_dedup_filtered.bam /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam \
        | somatic_filter 1e-5 18 0 \
        > CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM/CR-MGUS-10_10-BM_dedup_filtered.bam.$chrom:$start..$end.vcf

    cat CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM/var_command.txt | /home/akansha/speedseq//bin/parallel -j 1
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR- MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-  MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR- MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
ERROR: mismatched line lengths at line 3 within sequence 
File not suitable for fasta index generation.
index file /home/akansha/vivekruhela/ega_data_1901/sequenza_delly_analysis/CR-MGUS-10_10/CR-MGUS-10_10-BM_dedup_filtered.bam.fai not found, generating...
grep "^##" CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM/CR-MGUS-10_10-BM_dedup_filtered.bam.1:0..249250621.vcf \
    | cat - <(echo '##INFO=<ID=SSC,Number=1,Type=Float,Description="Somatic score">') <(grep "^#CHROM" CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM/CR-MGUS-10_10-BM_dedup_filtered.bam.1:0..249250621.vcf) > CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM/header.txt

    cat CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM/CR-MGUS-10_10-BM_dedup_filtered.bam."$chrom:$start..$end".vcf | grep -v "^#" \
        | sort -k1,1 -k2,2n | cat CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM/header.txt - \
        | /home/akansha/speedseq//bin/bgzip -c > CR-MGUS-10_10-BM_dedup_filtered.bam.vcf.gz

  /home/akansha/speedseq//bin/tabix -f -p vcf CR-MGUS-10_10-BM_dedup_filtered.bam.vcf.gz
# Make PED file
echo -e "1\tCR-MGUS-10_10-PB\tNone\tNone\t0\t1\n1\tCR-MGUS-10_10-BM\tNone\tNone\t0\t2" > CR-MGUS-10_10-BM_dedup_filtered.bam.ped

    rm -r CR-MGUS-10_10-BM_dedup_filtered.bam.CNz74wZ1KYmM
Done

Kindly Suggest how to deal with this issue. Thanks.