signalbash / how_are_we_stranded_here

Check strandedness of RNA-Seq fastq files
MIT License
112 stars 23 forks source link

Getting Error while indexing BAM file #20

Open sanyalab opened 6 months ago

sanyalab commented 6 months ago

Hello,

I am getting an error while indexing of the BAM files. I have installed the tool using conda. So the kallisto version is at 0.44 as mentioned. Below is STDOUT

Results stored in: stranded_test_OUTP_AR7540001_r1`
converting gtf to bed
Checking if fasta headers and bed file transcript_ids match...
OK!
generating kallisto index

[build] loading fasta file Frankliniella_occidentalis_coding_cDNA.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 19 target sequences
[build] warning: replaced 180 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 79961 contigs and contains 39076736 k-mers

creating fastq files with first 200000 reads
quantifying with kallisto

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 27,492
[index] number of k-mers: 39,076,736
[index] number of equivalence classes: 48,792
[quant] running in paired-end mode
[quant] will process pair 1: stranded_test_OUTP_AR7540001_r1/OUTP_AR7540001_r1_sample.fq
                             stranded_test_OUTP_AR7540001_r1/OUTP_AR7540001_r2_sample.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 200,000 reads, 139,619 reads pseudoaligned
[quant] estimated average fragment length: 163.607
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 887 rounds
[  bam] writing pseudoalignments to BAM format .. done
[  bam] sorting BAM files .. done
[  bam] indexing BAM file .. [E::hts_idx_push] Region 1270717..538140311 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
 invalid return code when indexing file -1 .. done

checking strandedness
Reading reference gene model stranded_test_OUTP_AR7540001_r1/Frankliniella_occidentalis_coding.bed ... Done
Loading SAM/BAM file ...  Total 200000 usable reads were sampled
Traceback (most recent call last):
  File "/abs/sanyalab/anaconda3/envs/SRAssemblyEnv/bin/check_strandedness", line 10, in <module>
    sys.exit(main())
  File "/abs/sanyalab/anaconda3/envs/SRAssemblyEnv/lib/python3.10/site-packages/how_are_we_stranded_here/check_strandedness.py", line 185, in main
    result = pd.read_csv(test_folder + '/' + 'strandedness_check.txt', sep="\n", header=None)
  File "/abs/sanyalab/anaconda3/envs/SRAssemblyEnv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1013, in read_csv
    kwds_defaults = _refine_defaults_read(
  File "/abs/sanyalab/anaconda3/envs/SRAssemblyEnv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 2226, in _refine_defaults_read
    raise ValueError(
ValueError: Specified \n as separator or delimiter. This forces the python engine which does not accept a line terminator. Hence it is not allowed to use the line terminator as separator.

Please advice how to proceed.

Thanks Abhijit

makshuf commented 4 months ago

@sanyalab Try using Python 3.7, which solved a similar issue I was facing.

smarhon commented 1 month ago

I have the same problem. I tried Python 3.10, 3.7 and 3.6. All have the same problem. Any advice?

kkupkova commented 1 month ago

I had the same issue; then I found out this was solved previously and fixed on GitHub but not on Anaconda. You just have to install the tool directly from GitHub after uninstalling the current version:

pip uninstall how-are-we-stranded-here
pip install git+https://github.com/signalbash/how_are_we_stranded_here