Open Mahmoudbassuoni opened 6 months ago
Hi Mahmoudbassuoni,
Are you using pypy3? The error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 0: invalid start byte" occurred may be because of different encoding methods of two variables msa
and target
, you can print the type of msa
and target
in the localMSA.py at line 378 to determine the type first. Meanwhile, you can also print the length of msa
and target
, if one of the lengths equal 0, the error may occur.
Best.
Hi @ydLiu-HIT as I told you I am using the parallel method, so I edited the localMSA.py file to print the type and the length like this
# Debugging print statements
print(f"Type of msa: {type(msa)}, Length of msa: {len(msa)}")
print(f"Type of target: {type(target)}, Length of target: {len(target)}")
alignment = ksw2_aligner(msa, target, x_score)
and I got one line of the *_call.sh
pypy3 /home/mbassyouni/packages/Psi-caller-1.0.1/localMSA.py --fin_bam "/Data/dataflash/Benchmarking/analysis/preprocessing/HG002/BWA_GATK/recal_reads.bam" --fin_ref "/Data/dataflash/Benchmarking/hs37d5.fa" --minMQ "10" --minCNT "3" --perror_for_snp "0.1" --perror_for_indel "0.1" --ratio_identity_snp "0.2" --ratio_identity_indel "0.2" --max_merge_dis "5" --shift "5" --flanking "50" --useBaseQuality --chrName "1" --chrStart "90000001" --chrEnd "100000001" --fin_can "/Data/dataflash/Benchmarking/analysis/variant_calling/Psicaller/BWA-GATK/var.1_90000001_100000001.can" --fout_vcf "/Data/dataflash/Benchmarking/analysis/variant_calling/Psicaller/BWA-GATK/var.1_90000001_100000001.vcf"
and this was part of the output
Type of msa: <class 'str'>, Length of msa: 51
Type of target: <class 'str'>, Length of target: 52
Type of msa: <class 'str'>, Length of msa: 64
Type of target: <class 'str'>, Length of target: 51
Type of msa: <class 'str'>, Length of msa: 44
Type of target: <class 'str'>, Length of target: 51
Type of msa: <class 'str'>, Length of msa: 51
Type of target: <class 'str'>, Length of target: 51
Type of msa: <class 'str'>, Length of msa: 64
Type of target: <class 'str'>, Length of target: 64
Type of msa: <class 'str'>, Length of msa: 56
Type of target: <class 'str'>, Length of target: 64
Type of msa: <class 'str'>, Length of msa: 114
Type of target: <class 'str'>, Length of target: 114
Type of msa: <class 'str'>, Length of msa: 106
Type of target: <class 'str'>, Length of target: 114
Type of msa: <class 'str'>, Length of msa: 51
Type of target: <class 'str'>, Length of target: 51
Type of msa: <class 'str'>, Length of msa: 43
Type of target: <class 'str'>, Length of target: 51
both were strings and non was 0 in length
Hi @Mahmoudbassuoni, I'm getting into the same situation, have you solved this problem?
Best regards
@leedchou Unfortunately not. I have tried reaching @ydLiu-HIT multiple times on his email, but he is not answering.
Hi @ydLiu-HIT, I am trying to run the variant calling using multiple threads using this script,
where I am intending to run the pipeline over 2 bam files as mentioned in the scripts but I am getting into this error
which I am not the sure for the reason behind. N.B, I am running this on a 96 threads, 125 GB RAM server, so I started with 90 threads first but I found that there were intensive memory usage and then tried it with 30 threads and finally with 16 but still showing the same error, can you tell me your opinion about this? Thanks,