Open Al313 opened 10 months ago
Thanks for the message! Can I have a few more details about the bam file? Which virus you are working on and which reference have you used for mapping? Also did you run the first step with samtools prior trying the makereadcount command?
On Mon, Dec 11, 2023 at 7:02 PM Ali @.***> wrote:
Describe the bug
When trying to create the "strandcount" tables from bam files I get the following error:
./my-data/step0-input/iii-13-13MT2EXPIIIVP10FullRepseq25052018_S1_L001-sorted.bam. Exception in thread "main" java.lang.NegativeArraySizeException: -449 at makereadcount.Read.
(Read.java:39) at makereadcount.MakeReadCount.readFromFile(MakeReadCount.java:447) at makereadcount.MakeReadCount.readData(MakeReadCount.java:427) at makereadcount.MakeReadCount.run(MakeReadCount.java:62) at makereadcount.MakeReadCount.main(MakeReadCount.java:44) Here's the code that I run for this step:
for i in $(ls ./my-data/step0-input); do echo ${I} && java -cp ./lib/htsjdk-unspecified-SNAPSHOT.jar:./lib/picocli-4.1.2.jar:./lib/pal-1.5.1.jar:./lib/cache2k-all-1.0.2.Final.jar:./lib/commons-math3-3.6.1.jar:./jar/MakeReadCount.jar makereadcount.MakeReadCount ./my-data/step0-input/${i}; done
Additional context
- The same command works for example bam files provided within the package.
- My bam files were created using BWA aligner with default parameters.
— Reply to this email directly, view it on GitHub https://github.com/ucl-pathgenomics/HaROLD/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNFCSXGELEOKWPP2PVVIWTYI5KDRAVCNFSM6AAAAABAQICB2GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTMMZUGIYTAMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you very much for the swift response.
We have done paired-end short-read Illumina sequencing of HIV-1 genome as part of a long-term experimental evolution. Currently there are more than 50 longitudinal sequencing data for each of our 8 independent passaging lines.
The reference sequence that I used is the consensus sequence of the plasmid used to produce the ancestral virus (HIV-1 NL4-3) in fasta format. The command that I use for mapping is as follows (it's a snippet of my snakemake pipeline):
rule all:
input:
expand("/home/amovas/data/scrap/test-harold/HaROLD/my-data/step0-input/{experiment}-{line}-{sample}-sorted.bam", zip,experiment=EXP, line=LINES, sample=SAMPLES)
rule bwa_map:
input:
ref="/home/amovas/data/genome-evo-proj/data/reference/plasmid/hiv_plasmid_ref_genome.fasta",
read1="/home/amovas/data/genome-evo-proj/data/freezed-raw-data/fastq/{experiment}/{line}/{sample}_R1_001.fastq.gz",
read2="/home/amovas/data/genome-evo-proj/data/freezed-raw-data/fastq/{experiment}/{line}/{sample}_R2_001.fastq.gz"
output:
temp("/home/amovas/data/scrap/test-harold/HaROLD/my-data/step0-input/{experiment}-{line}-{sample}.bam.tmp")
threads: 1
shell:
"(bwa mem -t {threads} {input.ref} {input.read1} {input.read2} | "
"samtools view -Sb - > {output})"
rule samtools_sort:
input:
"/home/amovas/data/scrap/test-harold/HaROLD/my-data/step0-input/{experiment}-{line}-{sample}.bam.tmp"
output:
"/home/amovas/data/scrap/test-harold/HaROLD/my-data/step0-input/{experiment}-{line}-{sample}-sorted.bam"
shell:
"samtools sort {input} > {output}"
Please let me know if it would help if I share one sequencing sample + reference sequence to pinpoint the problem (unfortunately sequencing files are a bit larger than GitHub size limit to share here).
Thanks in advance.
Thanks for the info. I think you need to run the first step in the README file under "prepare input files". Even if not necessary depending on the software used for alignment, it is needed for bam files produced with bwa.
samtools view -h -G69 file.bam | samtools view -h -G133 > newfile.bam
This gets rid of unmapped/duplicated multi-aligned reads. You can check the number of mapped reads with samtools in both files:
samtools view -c -F 260 file.bam
samtools view -c -F 260 newfile.bam
This should give you the same number.
Let me know if this solves the problem! If not, yes please share a bam file with me and I'll have a look!
Yes, I did try running "makereadcount" both on original and new bam files but the error persisted.
I have uploaded the reference sequence + sequencing reads of one sample on Google Drive. I will share the link with you via email.
Describe the bug
When trying to create the "strandcount" tables from bam files I get the following error:
Here's the code that I run for this step:
for i in $(ls ./my-data/step0-input); do echo ${I} && java -cp ./lib/htsjdk-unspecified-SNAPSHOT.jar:./lib/picocli-4.1.2.jar:./lib/pal-1.5.1.jar:./lib/cache2k-all-1.0.2.Final.jar:./lib/commons-math3-3.6.1.jar:./jar/MakeReadCount.jar makereadcount.MakeReadCount ./my-data/step0-input/${i}; done
Additional context