morispi / LEVIATHAN

Linked-reads based structural variant caller with barcode indexing
GNU Affero General Public License v3.0
3 stars 2 forks source link

No output generated #1

Closed annaorteu closed 2 years ago

annaorteu commented 3 years ago

Hi!

I have been trying to run LEVIATHAN on some linked read data without success and was wondering if you could help me out with it. My fastq files were first pre-processed to append the barcodes to the read indo as a BX tag and to trim adapters, low quality ends, etc. This was done using in-house scripts and not with LongRanger. Then I mapped the reads using bwa-mem and marked duplicates. I have attached a few likes from my bam file to show you how it looks like. When I run LEVIATHAN on these data, it seems to run fine, doesn't give me any error message, but the output, both vcf file and candidates.bedpe, are empty. Do you think that something in the format is causing the issue? I think the barcodes are in a slightly different format, could that be it? I guess it could also be that it is not finding anything?

Many thanks! Anna

run183.bam.txt

morispi commented 3 years ago

Hi!

Quickly looking at your BAM files seems to indicate your BX tags are weird. I see they read thing like BX:Z:A41C29B38D55, while I believe BX tags should contain the actual nucleotides of the barcodes? Moreover, LEVIATHAN considers the barcodes are 16 bp long by default, and yours only seem to be 12?

There should be no issue preprocessing the reads and aligning with BWA-MEM, avoiding LongRanger altogether however.

Could you maybe show me a few lines of your fastq file so I can try to better pinpoint the error?

Best, Pierre

annaorteu commented 3 years ago

Hi Pierre,

Thank you for such a quick answer! Yeah, I thought it could be that. I didn't do the pre-processing myself. Find attached a few lines of the fastq file. This is after pre-processing, as I don't have the raw data right now. Do you think that if the barcodes are changed to the nucleotide format LEVIATHAN could run fine?

Thanks Anna

run183.fastq.txt

morispi commented 3 years ago

Thanks for the quick response as well. :)

Yeah, it definitely seems to me like there is something wrong with the barcodes here. I can't tell for sure if changing the barcodes to nucleotide format would completely solve the issue, since the tool if pretty new, and I haven't been confronted to that yet.

However, I know for sure that LEVIATHAN expects to find nucleotides in the BX tags, and then converts them into 2 bits per nucleotide format. I believe this might cause some weird behaviour at conversion time if something else than nucleotides is provided. That wouldn't seem too weird if this then has an impact in further steps, especially given your candidates file is empty as well.

Hope this wouldn't take too long do redo the pre-processing and alignment? Meanwhile, I can run tests of the few BAM lines you provided, and see if I can come up with something.

Best, Pierre

morispi commented 3 years ago

Hello,

I tried to come up with a quick fix that could probably address this issue. Basically I just store the plain barcodes instead of converting them into 2 bits per nucleotide format. Code is still pretty dirty and needs a serious cleanup, but I do believe this might allow LEVIATHAN to run.

Could you try to pull the latest changes to LEVIATHAN, then switch to the LRez submodule of your LEVIATHAN install, checkout the Haplotagging branch, pull to make sure everything is up to date, and reinstall both LRez and LEVIATHAN? Running the ./install.sh script should be enough for LRez, but you should run make clean && make for LEVIATHAN.

Basically, this should look like:

cd LEVIATHAN
git pull
cd LRez
git checkout Haplotagging
git pull
make clean
./install.sh
cd ..
make clean
make

Then try to launch the toy example and see if everything works:

./LRez/bin/LRez index bam -p -b example/example.bam -o example/barcodeIndex.bci
rm candidates.bedpe
./bin/LEVIATHAN -b example/example.bam -i example/barcodeIndex.bci -g example/genome.fasta -o example/SV.vcf

LEVIATHAN fully relies on LRez for barcodes management, and I've ran small tests that seem to show this new version does manage to process Haplotagging barcodes correctly. However, I did not run extensive tests on full datasets yet. Could you tell me if this works for you?

Best, Pierre

annaorteu commented 3 years ago

Hi Pierre

Amazing, thanks! I tested the code with a couple of samples and it seems to be working fine. I will run in on more samples and the full dataset and let you know how that turns out.

Thank you Anna

morispi commented 3 years ago

Hi Anna,

Cool, glad to hear such an easy fix seems to solve the problem! I will keep on performing additional experiments as well, but I'm definitely interested to know how it turns out for you.

Cheers, Pierre