morispi / LEVIATHAN

Linked-reads based structural variant caller with barcode indexing
GNU Affero General Public License v3.0
3 stars 2 forks source link

Aligner recommendation #6

Open milesandersonmn opened 2 years ago

milesandersonmn commented 2 years ago

Hello, I'm working on 10X reads from a non-model organism with around 13,000 contigs in the reference and was wondering which aligner would be recommended after running Longranger basic.

Thanks!

milesandersonmn commented 2 years ago

Turns out that bwa mem with the -Cp flag is the best for passing the BX:Z barcode tags to the SAM file. NextGenMap performed well for alignment and time, but I couldn't find a subcommand to pass the BX:Z tag to the alignment files. However, when I run LRez I get the 'stoi' error code as an output. Given the similar issues before with the haplotagging BX:Z, I think the BX:Z field might be improperly passed to the SAM file by the aligner?

Screen Shot 2022-06-10 at 12 37 14 PM

Compared to the position of the BX:Z tag in the resolved haplotagging issue, mine is placed at the end and usually after an XA tag with a semicolon. Is there any apparent solution for this?

milesandersonmn commented 2 years ago

Used the EMA aligner to see if it was some issue with the order of the BAM fields and got the same 'stoi' error. Wondering if it could still be an error resulting from the window size as I have 92 contigs that are 1000 bp or less, but I don't know the window size to filter the smaller contigs from the bam file unfortunately.

jtweir commented 1 year ago

@milesandersonmn

Used the EMA aligner to see if it was some issue with the order of the BAM fields and got the same 'stoi' error. Wondering if it could still be an error resulting from the window size as I have 92 contigs that are 1000 bp or less, but I don't know the window size to filter the smaller contigs from the bam file unfortunately.

I am also getting the same "stoi" error using bwa-mem (with -Cp flag) and indexed with samtools. I would love to know if you found a solution?

milesandersonmn commented 1 year ago

It's been some time since I worked with this pipeline, so I deleted a lot of the files for storage space unfortunately and not sure I can recall exactly how I circumvented this.

But if I'm remembering correctly, I actually ended up using the phased position sorted bam files from the longranger WGS pipeline instead of the longranger basic. Which shouldn't matter too much, but I concatenated all the unaligned contigs onto an artificial "Chromosome 0" which I ended up discarding anyway from the analysis. So I used only chromosome level contigs to run the Longranger pipeline and then used the BAM file for LRez.

How many contigs are you including? And what are the lengths of the shortest contigs?

jtweir commented 1 year ago

If we assemble them into scaffolds with supernova, they range from 1000 bp to >40,000,000 bp. At the moment I am simply using all 10X reads. I presume you ran into issues doing that?

To clarify, the procedure you used is to take only 10X reads that had formed chromosome-length contigs, but you are not running those assembled contigs through LRex, but rather a subset of the 10X reads that belong to those contigs? In our case we could do something like that (though we cannot get chromosome length contigs for our 10x genomes due to large repetitive elements that 10x cannot seem to span, but we could still include only contigs that exceed a certain length), but I wouldn't know how to filter out the reads that do not form into large contigs.

milesandersonmn commented 1 year ago

We had a draft assembly for our organism so used Longranger to align and variant call rather than running supernova for assembly. But our draft assembly is very fragmented and includes a lot of small contigs. However, we do have chromosome length contigs, so I excluded all the contigs from the reference fasta except for the chromosome length contigs when running the Longranger aligner.

If I recall correctly though, including too many and too small contigs for LRez was what was causing the error. Do you have a reference genome you're aligning to?

jtweir commented 1 year ago

Thanks Miles,

We have gotten LEVIATHAN to work. Yes, we assembled a ref genome (not chromosome scale. Our largest scaffold is half the length of the largest chromosome) using supernova for one species in a genus and then are using LEVIATHAN to reconstruct SVs between another species in the genus and the reference. We eventually got both LRez and LEVIATHAN to work, even with tens of thousands of scaffolds, but it took days. Instead we are now running it in about 6 hours when using just scaffolds that exceed 100 kb (a few hundred of these). Thanks for your comments!

Best wishes, Jason

###################################### Jason Weir

Professor Dept. of Ecology and Evolution and Dept. Biological Sciences University of Toronto 1265 Military Trail Toronto, Ontario, Canada MIC 1A4 http://www.utsc.utoronto.ca/~jweir/ (reprints) ######################################


From: Miles Anderson @.> Sent: March 28, 2023 6:49 AM To: morispi/LEVIATHAN @.> Cc: Jason Weir @.>; Comment @.> Subject: Re: [morispi/LEVIATHAN] Aligner recommendation (Issue #6)

We had a draft assembly for our organism so used Longranger to align and variant call rather than running supernova for assembly. But our draft assembly is very fragmented and includes a lot of small contigs. However, we do have chromosome length contigs, so I excluded all the contigs from the reference fasta except for the chromosome length contigs when running the Longranger aligner.

If I recall correctly though, including too many and too small contigs for LRez was what was causing the error. Do you have a reference genome you're aligning to?

— Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmorispi%2FLEVIATHAN%2Fissues%2F6%23issuecomment-1486634655&data=05%7C01%7Cjason.weir%40utoronto.ca%7C7966f8e091d348a0f12208db2f7a1126%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638155973536305446%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yc4MKor9mGVbOahNvZbHXS4eT%2FzSWdwrPFYIHzrFEG4%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALXMBON43O6R33QH5GUWIYTW6K62PANCNFSM5XNKJILQ&data=05%7C01%7Cjason.weir%40utoronto.ca%7C7966f8e091d348a0f12208db2f7a1126%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638155973536305446%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aIpkKs1SOUdBsOb3%2FODPxW3ntueXNvAOEtq88%2Fa4IgU%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>